With the inclusion of smart meters, electricity load consumption data can be fetched for individual consumer buildings at high temporal resolutions. Availability of such data has made it possible to study daily load demand profiles of the households. Clustering households based on their demand profiles is one of the primary, yet a key component of such analysis. While many clustering algorithms/frameworks can be deployed to perform clustering, they usually generate very different clusters. In order to identify the best clustering results, various cluster validation indices (CVIs) have been proposed in the literature. However, it has been noticed that different CVIs often recommend different algorithms. This leads to the problem of identifying the most suitable CVI for a given dataset. Responding to the problem, this paper shows how the recommendations of validation indices are influenced by different data characteristics that might be present in a typical residential load demand dataset. Furthermore, the paper identifies the features of data that prefer/prohibit the use of a particular cluster validation index.
翻译:包含智能仪表,可以以高时间分辨率为单个消费建筑获取电力载荷消费数据。这些数据的提供使得能够研究住户每日负载需求概况。基于其需求概况的组合家庭是这种分析的主要组成部分之一,但却是关键组成部分之一。虽然可以部署许多组合算法/框架来进行分组,但它们通常产生非常不同的组群。为了确定最佳组群结果,文献中提出了各种群集验证指数(CVIs),然而,人们注意到,不同的群集验证指数(CVIs)常常建议不同的算法。这导致了为某一数据集确定最合适的 CVI的问题。针对这一问题,本文说明了验证指数的建议如何受到典型的住宅负载需求数据集中可能存在的不同数据特征的影响。此外,本文件还确定了倾向于/鼓励使用特定组群集验证指数的数据特征。