The digitization of the energy infrastructure enables new, data driven, applications often supported by machine learning models. However, domain specific data transformations, pre-processing and management in modern data driven pipelines is yet to be addressed. In this paper we perform a first time study on data models, energy feature engineering and feature management solutions for developing ML-based energy applications. We first propose a taxonomy for designing data models suitable for energy applications, analyze feature engineering techniques able to transform the data model into features suitable for ML model training and finally also analyze available designs for feature stores. Using a short-term forecasting dataset, we show the benefits of designing richer data models and engineering the features on the performance of the resulting models. Finally, we benchmark three complementary feature management solutions, including an open-source feature store.
翻译:能源基础设施的数字化使新的、数据驱动的、往往由机器学习模型支持的应用得以实现,然而,尚未解决具体领域的数据转换、现代数据驱动管道的预处理和管理等问题;在本文件中,我们首次对数据模型、能源特征工程和开发以ML为基础的能源应用的地物管理解决方案进行了研究;我们首先提出了设计适合能源应用的数据模型的分类学,分析了能够将数据模型转化为适合ML模型培训的特征的地物工程技术,最后还分析了地物仓库的可用设计。我们利用短期预测数据集,展示了设计更丰富的数据模型和对由此形成的模型的性能进行工程设计的好处。最后,我们确定了三个互补的地物管理解决方案的基准,包括一个开放源地物储存库。