Project Description

NOVEC has the last five years of data, from 2011 to 2015 on daily electricity usage for about 850 customers. The data has Customer Group attribute, which identifies what type of client it is Residential, Small Commercial, or Large Commercial. It also has a unique Map Location Number that tells you the geospatial location of the client, and finally an Account Number that can tell you if the client has changed or not. By studying this data and finding a way to segment the customers into groups according to similar consumption patterns, the final goal is find optimum number of clusters, segement customers using k-means alogorithm and validate the uniqueness of each segement. With the amount of data available, it is important to scope the problem into more manageable parts. Currently, NOVEC's peak electricity usage happens in the month of July. When looking at the daily peak of electric usage, it is around 7 p.m. time frame. Since electricity usage changes from month to month depending on the seasonal fluctuations and temperature changes, the initial focus for this project will be the month of July.Then, using the same clustering technique, segment the consumers January usage respective of NOVEC's total peak consumption and Validate the consumer segments by looking at load profiles.

Problem Statement

NOVEC has sample customer data from a stratified random sample of all of its customers. NOVEC would like to determine if the stratified sample it has can be used to segment its customers by their contribution towards NOVEC's peak demand and total energy purchases. NOVEC would like to know the recommended number of segments and the characteristics of those segments.