Customer Categorization for Load Forecasting

SYST/OR 699 Capstone Project


Currently NOVEC purchases power from power suppliers depending on the customer demand. Usually there is an amount of power that has been purchased but hasn't been consumed by any of NOVEC's customers. This causes an increase cost of providing power to customers, because they are ordering more power than needed. NOVEC is developing a system that can better forecast customer usage. However, they understand that categorizing customers in more than two categories, residential and non-residential, can increase the accuracy of their load forecast. NOVEC is requesting to provide a better algorithm for customer categorization that can help in better forecasting in the future.

NOVEC is also interested in learning if the survey meter population is representative of their overall customer population.


Devise an algorithm that can evaluate categorization methods for NOVEC's current and prospective customers. The use of these categories will help NOVEC's current forecasting system to better forecast load, which will lead to decreasing the cost of energy purchased from other power suppliers.


The project was divided into two phases: evaluate current Billing Rate Code (BRC) categorization for self-consistency, and develop new categorizations if behavior was too varied in BRC groupings. The team took an iterative approach using data clustering algorithms and evaluation of clusters treated as customer categories using various internally derived quantitative metrics to condense the data. Exploration of BRC as a categorization/clustering variable led to the development of a metric called Hourly Departure From Typical (HDFT) that involved scaling each customer's power consumption to a percentage of their own average, and then finding a mean and standard of deviation of that value across individual reading times within a BRC grouping. If the number of data points outside of two standards of deviation for a given customer was found to be too high, that customer was deemed not to have similar consumption behavior to the rest of the group. In phase two, the team evaluated customers based on their average power consumption masking the data so that the average was calculated only for specific time regions (focusing on summer-only data, or weekday-only, data, etc.) and comparing those averages to the averages in the complement data set not matching the filter. If the ratio of those two averages was similar between two customers, they could be said to have similar consumption behavior on that time scale.

Applying that heuristic to seasonal variation, day-of-week variation, and daytime consumption versus nightly consumption, the team developed 24 labeled consumption classes tracking consumption ratios on each of the three time scales. After reorganizing the data into the new class bins, the team reapplied the HDFT metric described above in order to determine the accuracy of class assignments.

Results and Measure of Success

The final report includes a suggested listing of categories with class assignments for all customers in the sample population and greater detail about the approach along with additional intermediate analysis metrics. While the project members did not have direct access to the SASS model NOVEC employs to conduct purchasing predictions, they shared early results of assignment pairings with NOVEC personnel. Initial evaluation by NOVEC personnel indicated strong possibility of improved accuracy of prediction. The accuracy of future predictions will be tracked by NOVEC staff going forward and the major stakeholders at NOVEC can be contacted via their website for further inquiries.