Many real-world applications require the prediction of long sequence time-series, such as electricity consumption planning. Long sequence time-series forecasting (LSTF) demands a high prediction capacity of the model, which is the ability to capture precise long-range dependency coupling between output and input efficiently. Recent studies have shown the potential of Transformer to increase the prediction capacity. However, there are several severe issues with Transformer that prevent it from being directly applicable to LSTF, including quadratic time complexity, high memory usage, and inherent limitation of the encoder-decoder architecture. To address these issues, we design an efficient transformer-based model for LSTF, named Informer, with three distinctive characteristics: (i) a $ProbSparse$ self-attention mechanism, which achieves $O(L \log L)$ in time complexity and memory usage, and has comparable performance on sequences' dependency alignment. (ii) the self-attention distilling highlights dominating attention by halving cascading layer input, and efficiently handles extreme long input sequences. (iii) the generative style decoder, while conceptually simple, predicts the long time-series sequences at one forward operation rather than a step-by-step way, which drastically improves the inference speed of long-sequence predictions. Extensive experiments on four large-scale datasets demonstrate that Informer significantly outperforms existing methods and provides a new solution to the LSTF problem.
翻译:许多现实世界应用要求预测长序列时间序列,例如电力消耗规划。长序列时间序列预测要求模型的预测能力高,即能够高效率地捕捉产出和输入之间准确的长距离依赖性连接。最近的研究显示变异器有可能提高预测能力。然而,变异器存在若干严重问题,无法直接适用于LSTF,包括二次时间复杂性、高记忆使用和编码脱co器结构的内在限制。为了解决这些问题,我们为LSTF设计了一个高效的变异器型模型,名为Inexer,具有三个不同的特点:(一) 以美元为PROBSparse$的自留机制,在时间和记忆使用方面达到$O(L\log L)美元,在序列依赖性对齐方面也有相似的性能。 (二) 自我蒸馏现象突出,通过将层输入减半,高效地处理极端长的输入序列。 (三) 基因变异式变异式风格模式,在远级的顺序上大大地展示了前方的变形操作速度,而在概念上则简单地预测了四级的顺序上,在前进阶上大大地改进了前程的变式的变式的变距,在前程上,在前级的变变变变变距上显示了一种变距式的变距式的变距式的变距式的变距式的变距。