重新思考基于深度学习的时间序列预测：进一步探究深度学习在时间序列预测中的应用 (Two Steps Forward and One Behind: Rethinking Time Series Forecasting with Deep Learning)

The Transformer is a highly successful deep learning model that has revolutionised the world of artificial neural networks, first in natural language processing and later in computer vision. This model is based on the attention mechanism and is able to capture complex semantic relationships between a variety of patterns present in the input data. Precisely because of these characteristics, the Transformer has recently been exploited for time series forecasting problems, assuming its natural adaptability to the domain of continuous numerical series. Despite the acclaimed results in the literature, some works have raised doubts about the robustness of this approach. In this paper, we further investigate the effectiveness of Transformer-based models applied to the domain of time series forecasting, demonstrate their limitations, and propose a set of alternative models that are better performing and significantly less complex. In particular, we empirically show how simplifying this forecasting model almost always leads to an improvement, reaching the state of the art among Transformer-based architectures. We also propose shallow models without the attention mechanism, which compete with the overall state of the art in long time series forecasting, and demonstrate their ability to accurately predict extremely long windows. We show how it is always necessary to use a simple baseline to verify the effectiveness of one's models, and finally we conclude the paper with a reflection on recent research paths and the desire to follow trends and apply the latest model even where it may not be necessary.

翻译：Transformer模型是一种高度成功的深度学习模型，它在自然语言处理领域和后来的计算机视觉领域中彻底颠覆了人工神经网络的世界。此模型基于注意力机制，能够捕捉输入数据中各种模式之间的复杂语义关系。正是因为这些特征，Transformer模型最近被应用于时间序列预测问题，假设其对连续数字序列的领域具有自然适应性。尽管这种方法在文献中取得了广泛的成果，但一些工作对这种方法的鲁棒性提出了质疑。在本文中，我们进一步研究了基于Transformer模型的时间序列预测方法的有效性，展示了其局限性，并提出了一组性能更好且复杂性显著降低的替代模型。特别是，我们通过实验证明，在几乎所有情况下，简化这个预测模型都会带来改进，达到Transformer-based架构的最新进展水平。我们还提出了不带注意机制的浅层模型，它们在长时间序列预测方面竞争整体最新进展，并展示了它们准确预测超长窗口的能力。我们展示了总是需要使用简单的基线来验证自己的模型的有效性，并最终以对最近的研究路径和追求趋势的反思为结论。