Active learning (AL) could contribute to solving critical environmental problems through improved spatio-temporal predictions. Yet such predictions involve high-dimensional feature spaces with mixed data types and missing data, which existing methods have difficulties dealing with. Here, we propose a novel batch AL method that fills this gap. We encode and cluster features of candidate data points, and query the best data based on the distance of embedded features to their cluster centers. We introduce a new metric of informativeness that we call embedding entropy and a general class of neural networks that we call embedding networks for using it. Empirical tests on forecasting electricity demand show a simultaneous reduction in prediction error by up to 63-88% and data usage by up to 50-69% compared to passive learning (PL) benchmarks.
翻译:积极学习(AL)可以通过改进时空预测来帮助解决关键的环境问题。然而,这种预测涉及高维特征空间,其中含有混合数据类型和缺失数据,而现有方法对此有困难。在这里,我们提出一套新颖的AL类方法,以填补这一空白。我们根据嵌入特性与其集束中心之间的距离,对候选数据点进行编码和组合,并查询最佳数据。我们引入了一种新的信息度量,我们称之为嵌入导体,以及我们称之为嵌入网络使用的一般神经网络类别。预测电力需求的经验测试显示,预测误差同时减少63%至88%,数据使用率比被动学习基准减少高达50%至69%。