Recently, there has been a surge of Transformer-based solutions for the long-term time series forecasting (LTSF) task. Despite the growing performance over the past few years, we question the validity of this line of research in this work. Specifically, Transformers is arguably the most successful solution to extract the semantic correlations among the elements in a long sequence. However, in time series modeling, we are to extract the temporal relations in an ordered set of continuous points. While employing positional encoding and using tokens to embed sub-series in Transformers facilitate preserving some ordering information, the nature of the \emph{permutation-invariant} self-attention mechanism inevitably results in temporal information loss. To validate our claim, we introduce a set of embarrassingly simple one-layer linear models named LTSF-Linear for comparison. Experimental results on nine real-life datasets show that LTSF-Linear surprisingly outperforms existing sophisticated Transformer-based LTSF models in all cases, and often by a large margin. Moreover, we conduct comprehensive empirical studies to explore the impacts of various design elements of LTSF models on their temporal relation extraction capability. We hope this surprising finding opens up new research directions for the LTSF task. We also advocate revisiting the validity of Transformer-based solutions for other time series analysis tasks (e.g., anomaly detection) in the future. Code is available at: \url{https://github.com/cure-lab/LTSF-Linear}.
翻译:最近,以变异器为基础的长期时间序列预测(LTSF)任务出现了以变异器为基础的解决方案的激增。 尽管过去几年来绩效不断提高, 我们质疑这一研究路线在这项工作中的有效性。 具体地说, 变异器可以说是最成功的解决方案, 可以在元素之间的长期序列中提取语义相关性。 然而, 在时间序列模型中, 我们将在一系列连续的连续点中提取时间关系。 在使用定位编码和使用符号在变异器中嵌入子序列的同时, 有助于保存一些定序信息, 尽管过去几年来, 自我关注机制的性质不可避免地导致时间信息损失。 具体地说, 变异器是一套令人尴尬的简单单层线性模型, 用于从元素中提取长期语义。 但是, 九种真实的数据集的实验结果显示, LTSF-Lin 惊人地超越了现有精密变异器的LTSFSF模型, 并且常常有很大的幅度。 此外, 我们进行全面的实证研究, 探索各种设计要素的LFSFSFSF 系列中, 我们的SERFSFSFSFSFSF 的最近定位分析也开启了其他任务。