The digital transformation of the energy infrastructure enables new, data driven, applications often supported by machine learning models. However, domain specific data transformations, pre-processing and management in modern data driven pipelines is yet to be addressed. In this paper we perform a first time study on generic data models that are able to support designing feature management solutions that are the most important component in developing ML-based energy applications. We first propose a taxonomy for designing data models suitable for energy applications, explain how this model can support the design of features and their subsequent management by specialized feature stores. Using a short-term forecasting dataset, we show the benefits of designing richer data models and engineering the features on the performance of the resulting models. Finally, we benchmark three complementary feature management solutions, including an open-source feature store suitable for time series.
翻译:能源基础设施的数字转换使得新的、数据驱动的、往往由机器学习模型支持的应用能够产生新的、数据驱动的、经常得到机器学习模型支持的应用。然而,在现代数据驱动的管道中,具体领域的数据转换、预处理和管理还有待解决。在本文件中,我们首次对能够支持设计特征管理解决方案的通用数据模型进行了研究,这些模型是开发以ML为基础的能源应用的最重要组成部分。我们首先提出了设计适合能源应用的数据模型的分类法,并解释了这一模型如何支持特征的设计及其随后由专门特征仓库管理。我们利用短期预测数据集,展示了设计更丰富的数据模型和工程设计由此形成的模型性能的特征的好处。最后,我们设定了三个互补特征管理解决方案的基准,包括适合时间序列的开放源特征仓库。