In many real-world scenarios, we often deal with streaming data that is sequentially collected over time. Due to the non-stationary nature of the environment, the streaming data distribution may change in unpredictable ways, which is known as concept drift. To handle concept drift, previous methods first detect when/where the concept drift happens and then adapt models to fit the distribution of the latest data. However, there are still many cases that some underlying factors of environment evolution are predictable, making it possible to model the future concept drift trend of the streaming data, while such cases are not fully explored in previous work. In this paper, we propose a novel method DDG-DA, that can effectively forecast the evolution of data distribution and improve the performance of models. Specifically, we first train a predictor to estimate the future data distribution, then leverage it to generate training samples, and finally train models on the generated data. We conduct experiments on three real-world tasks (forecasting on stock price trend, electricity load and solar irradiance) and obtain significant improvement on multiple widely-used models.
翻译:在许多现实世界的情景中,我们经常处理随着时间推移而相继收集的流数据。由于环境的非静止性质,流数据分布可能会以不可预测的方式发生变化,即概念漂移。为了处理概念漂移,以前的方法首先在概念漂移发生时/地点发现概念漂移,然后调整模型以适应最新数据的分布,然而,仍然有许多情况表明,环境演变的某些基本因素是可以预测的,从而有可能模拟流数据的未来概念漂移趋势,而以前的工作并没有充分探讨这类情况。我们在本文件中提出了一种新的DDDG-DA方法,这种方法可以有效地预测数据分布的演变和改进模型的性能。具体地说,我们首先培训预测员来估计未来数据分布,然后利用它来生成培训样本,最后培训生成的数据模型。我们就三种真实世界的任务(预测股票价格趋势、电负荷和太阳能辐照)进行了实验,并在多种广泛使用的模型上取得了显著改进。