变换器对时间序列预测有效吗? (Are Transformers Effective for Time Series Forecasting?)

Recently, there has been a surge of Transformer-based solutions for the time series forecasting (TSF) task, especially for the challenging long-term TSF problem. Transformer architecture relies on self-attention mechanisms to effectively extract the semantic correlations between paired elements in a long sequence, which is permutation-invariant and anti-ordering to some extent. However, in time series modeling, we are to extract the temporal relations among an ordering set of continuous points. Consequently, whether Transformer-based techniques are the right solutions for long-term time series forecasting is an interesting problem to investigate, despite the performance improvements shown in these studies. In this work, we question the validity of Transformer-based TSF solutions. In their experiments, the compared (non-Transformer) baselines are mainly autoregressive forecasting solutions, which usually have a poor long-term prediction capability due to inevitable error accumulation effects. In contrast, we use an embarrassingly simple architecture named DLinear that conducts direct multi-step (DMS) forecasting for comparison. DLinear decomposes the time series into a trend and a remainder series and employs two one-layer linear networks to model these two series for the forecasting task. Surprisingly, it outperforms existing complex Transformer-based models in most cases by a large margin. Therefore, we conclude that the relatively higher long-term forecasting accuracy of Transformer-based TSF solutions shown in existing works has little to do with the temporal relation extraction capabilities of the Transformer architecture. Instead, it is mainly due to the non-autoregressive DMS forecasting strategy used in them. We hope this study also advocates revisiting the validity of Transformer-based solutions for other time series analysis tasks (e.g., anomaly detection) in the future.

翻译：最近,在时间序列预测任务(TSF)中,基于变压器的解决方案激增了,特别是具有挑战性的长期 TSF 问题。变压器架构依靠自省机制有效提取长序列中配对元素的语义相关性, 长期序列中配对元素是通变异的, 在某种程度上是反顺序的。但是, 在时间序列模型中, 我们要提取一个连续点定序组之间的时间关系。因此, 变压器技术是否是长期时间序列预测的正确解决方案, 尽管这些研究显示性能改进, 也是一个有趣的问题。在这项工作中, 我们质疑基于变压器的 TSFSF 解决方案的有效性。在实验中, 比较( 非变压式) 基线主要是自动递增的预测解决方案, 通常由于不可避免的错误累积效应而导致长期预测能力较差。相比之下, 我们使用一个叫DLinear的尴尬简单架构, 直接进行多级( DMS) 预测, 到期的预测。 DLinearsher 。将这个时间序列中的时间序列显示一个趋势, 和大多数变压变压式的变压式战略序列中, 将两个变压式模型用于Serview 。