In the current context of Big Data, the nature of many forecasting problems has changed from predicting isolated time series to predicting many time series from similar sources. This has opened up the opportunity to develop competitive global forecasting models that simultaneously learn from many time series. But, it still remains unclear when global forecasting models can outperform the univariate benchmarks, especially along the dimensions of the homogeneity/heterogeneity of series, the complexity of patterns in the series, the complexity of forecasting models, and the lengths/number of series. Our study attempts to address this problem through investigating the effect from these factors, by simulating a number of datasets that have controllable time series characteristics. Specifically, we simulate time series from simple data generating processes (DGP), such as Auto Regressive (AR) and Seasonal AR, to complex DGPs, such as Chaotic Logistic Map, Self-Exciting Threshold Auto-Regressive, and Mackey-Glass Equations. The data heterogeneity is introduced by mixing time series generated from several DGPs into a single dataset. The lengths and the number of series in the dataset are varied in different scenarios. We perform experiments on these datasets using global forecasting models including Recurrent Neural Networks (RNN), Feed-Forward Neural Networks, Pooled Regression (PR) models and Light Gradient Boosting Models (LGBM), and compare their performance against standard statistical univariate forecasting techniques. Our experiments demonstrate that when trained as global forecasting models, techniques such as RNNs and LGBMs, which have complex non-linear modelling capabilities, are competitive methods in general under challenging forecasting scenarios such as series having short lengths, datasets with heterogeneous series and having minimal prior knowledge of the patterns of the series.
翻译:在《大数据》的当前背景下,许多预测问题的性质已经从预测孤立的时间序列到从类似来源预测许多时间序列而改变。这为开发具有竞争性的全球预测模型提供了机会,这些模型同时从许多时间序列中学习。但是,当全球预测模型能够超过非象形基准时,仍然不清楚,特别是在序列的同质性/异质性、系列模式的复杂性、预测模型的复杂性以及序列的长度/数量等方面,许多预测问题的性质已经发生了变化。我们的研究试图通过调查这些因素的影响来解决这一问题,方法是模拟一些具有可控时间序列特性的数据集。具体地说,我们模拟从简单的数据生成流程(DGP),例如自动递增模式(AR)和季节性ARA,到复杂的DGP等维度基准,例如,查托式物流图,自动Excrivect-Tristock Aut-Regress, 以及麦克基-Glas Equalation。当将若干个从DGP生成的时间序列中生成的不固定的短期数据序列引入,而数据变异变异性变的数值则在以往的模型中进行不同的数据序列,包括不同的数据预测。