Recently, there has been a surge of Transformer-based solutions for the time series forecasting (TSF) task, especially for the challenging long-term TSF problem. Transformer architecture relies on self-attention mechanisms to effectively extract the semantic correlations between paired elements in a long sequence, which is permutation-invariant and anti-ordering to some extent. However, in time series modeling, we are to extract the temporal relations among an ordering set of continuous points. Consequently, whether Transformer-based techniques are the right solutions for long-term time series forecasting is an interesting problem to investigate, despite the performance improvements shown in these studies. In this work, we question the validity of Transformer-based TSF solutions. In their experiments, the compared (non-Transformer) baselines are mainly autoregressive forecasting solutions, which usually have a poor long-term prediction capability due to inevitable error accumulation effects. In contrast, we use an embarrassingly simple architecture named DLinear that conducts direct multi-step (DMS) forecasting for comparison. DLinear decomposes the time series into a trend and a remainder series and employs two one-layer linear networks to model these two series for the forecasting task. Surprisingly, it outperforms existing complex Transformer-based models in most cases by a large margin. Therefore, we conclude that the relatively higher long-term forecasting accuracy of Transformer-based TSF solutions shown in existing works has little to do with the temporal relation extraction capabilities of the Transformer architecture. Instead, it is mainly due to the non-autoregressive DMS forecasting strategy used in them. We hope this study also advocates revisiting the validity of Transformer-based solutions for other time series analysis tasks (e.g., anomaly detection) in the future.
翻译:最近,在时间序列预测任务(TSF)中,基于变压器的解决方案激增了,特别是具有挑战性的长期 TSF 问题。 变压器架构依靠自省机制有效提取长序列中配对元素的语义相关性, 长期序列中配对元素是通变异的, 在某种程度上是反顺序的。 但是, 在时间序列模型中, 我们要提取一个连续点定序组之间的时间关系。 因此, 变压器技术是否是长期时间序列预测的正确解决方案, 尽管这些研究显示性能改进, 也是一个有趣的问题。 在这项工作中, 我们质疑基于变压器的 TSFSF 解决方案的有效性。 在实验中, 比较( 非变压式) 基线主要是自动递增的预测解决方案, 通常由于不可避免的错误累积效应而导致长期预测能力较差。 相比之下, 我们使用一个叫DLinear的尴尬简单架构, 直接进行多级( DMS) 预测, 到期的预测。 DLinearsher 。 将这个时间序列中的时间序列显示一个趋势, 和大多数变压变压式的变压式战略序列中, 将两个变压式模型用于Serview 。