Data-driven building energy prediction is an integral part of the process for measurement and verification, building benchmarking, and building-to-grid interaction. The ASHRAE Great Energy Predictor III (GEPIII) machine learning competition used an extensive meter data set to crowdsource the most accurate machine learning workflow for whole building energy prediction. A significant component of the winning solutions was the pre-processing phase to remove anomalous training data. Contemporary pre-processing methods focus on filtering statistical threshold values or deep learning methods requiring training data and multiple hyper-parameters. A recent method named ALDI (Automated Load profile Discord Identification) managed to identify these discords using matrix profile, but the technique still requires user-defined parameters. We develop ALDI++, a method based on the previous work that bypasses user-defined parameters and takes advantage of discord similarity. We evaluate ALDI++ against a statistical threshold, variational auto-encoder, and the original ALDI as baselines in classifying discords and energy forecasting scenarios. Our results demonstrate that while the classification performance improvement over the original method is marginal, ALDI++ helps achieve the best forecasting error improving 6% over the winning's team approach with six times less computation time.
翻译:以数据驱动的建筑能源预测是测量和核查、建立基准和建设至电网互动过程的一个组成部分。ASHRAE Greater Energy Profector III(GEPIII)机器学习竞赛使用一个广泛的计量数据集,将最准确的机器学习工作流程用于整个建筑能源预测。成功的解决方案的一个重要部分是消除异常培训数据的处理前阶段。当代预处理方法侧重于筛选统计阈值或深层次学习方法,需要培训数据和多个超参数。最近一个名为ALDI(自动负载剖面分解识别)的方法(ALDI)设法利用矩阵配置找到这些不一致之处,但该技术仍需要用户定义参数。我们开发了ALDI++,这是以先前的工作为基础,绕过用户定义的参数,利用相似之处。我们根据统计阈值、变式自动编码和原始ALDI作为分类不匹配和能源预测情景的基线,对ALDI进行了评估。我们的结果表明,虽然原方法的分类性改进是边际的,但ALDI++有助于实现最佳的时间差,比团队的计算方法改进了6倍于6次。