Despite their growing popularity, data-driven models of real-world dynamical systems require lots of data. However, due to sensing limitations as well as privacy concerns, this data is not always available, especially in domains such as energy. Pre-trained models using data gathered in similar contexts have shown enormous potential in addressing these concerns: they can improve predictive accuracy at a much lower observational data expense. Theoretically, due to the risk posed by negative transfer, this improvement is however neither uniform for all agents nor is it guaranteed. In this paper, using data from several distributed energy resources, we investigate and report preliminary findings on several key questions in this regard. First, we evaluate the improvement in predictive accuracy due to pre-trained models, both with and without fine-tuning. Subsequently, we consider the question of fairness: do pre-trained models create equal improvements for heterogeneous agents, and how does this translate to downstream utility? Answering these questions can help enable improvements in the creation, fine-tuning, and adoption of such pre-trained models.
翻译:尽管人们越来越普遍,但现实世界动态系统的数据驱动模型需要大量数据。然而,由于遥感限制和隐私问题,这些数据并不总是可以获得的,特别是在能源等领域。使用类似情况下收集的数据的预先培训模型表明,在解决这些关切方面具有巨大的潜力:它们可以提高预测准确性,但观测数据费用要低得多。理论上,由于负面转移带来的风险,这种改进对于所有物剂来说既不统一,也得不到保障。在本文件中,利用一些分布式能源资源的数据,我们调查并报告这方面的几个关键问题的初步结论。首先,我们评估预先培训模型在预测准确性方面的改进,无论是否经过微调。随后,我们考虑公平问题:事先培训模型是否为多种物剂创造同等的改进,以及这如何转化为下游效用?回答这些问题有助于改进此类预先培训模型的创建、微调和采用。