Time series forecasting is an important problem, with many real world applications. Ensembles of deep neural networks have recently achieved impressive forecasting accuracy, but such large ensembles are impractical in many real world settings. Transformer models been successfully applied to a diverse set of challenging problems. We propose a novel adaptation of the original Transformer architecture focusing on the task of time series forecasting, called Persistence Initialization. The model is initialized as a naive persistence model by using a multiplicative gating mechanism combined with a residual skip connection. We use a decoder Transformer with ReZero normalization and Rotary positional encodings, but the adaptation is applicable to any auto-regressive neural network model. We evaluate our proposed architecture on the challenging M4 dataset, achieving competitive performance compared to ensemble based methods. We also compare against existing recently proposed Transformer models for time series forecasting, showing superior performance on the M4 dataset. Extensive ablation studies show that Persistence Initialization leads to better performance and faster convergence. As the size of the model increases, only the models with our proposed adaptation gain in performance. We also perform an additional ablation study to determine the importance of the choice of normalization and positional encoding, and find both the use of Rotary encodings and ReZero normalization to be essential for good forecasting performance.
翻译:时间序列预测是一个重要问题,有许多真实的世界应用。 深神经网络的集合最近取得了令人印象深刻的预测准确性, 但在许多真实的世界环境中, 如此庞大的集合不切实际。 变异模型被成功地应用于一系列不同的具有挑战性的问题 。 我们提议对最初的变异器结构进行新的调整, 重点是时间序列预测任务, 称为“ 常态初始化 ” 。 模型是一个天真的持久性模型, 使用多复制性格机制, 加上一个剩余跳过连接 。 我们使用ReZero正常化和扶轮性定位编码的脱coder变异器, 但任何自动递减性神经网络模型都适用这种调整 。 我们评估了我们提议的具有挑战性的 M4 数据集结构, 实现了竞争性的性能与基于共性的方法。 我们还比较了最近提出的时间序列预测变异模型, 显示了M4 数据集的优异性表现。 广泛的变异性研究表明, 初始化导致更好的性能和更快的趋同。 随着模型规模的扩大, 只有模型与我们提议的适应性变异性模型在正常状态中的重要性, 我们还进行一项对正统化, 进行进一步的研究。