Many real-world applications require the prediction of long sequence time-series, such as electricity consumption planning. Long sequence time-series forecasting (LSTF) demands a high prediction capacity of the model, which is the ability to capture precise long-range dependency coupling between output and input efficiently. Recent studies have shown the potential of Transformer to increase the prediction capacity. However, there are several severe issues with Transformer that prevent it from being directly applicable to LSTF, such as quadratic time complexity, high memory usage, and inherent limitation of the encoder-decoder architecture. To address these issues, we design an efficient transformer-based model for LSTF, named Informer, with three distinctive characteristics: (i) a $ProbSparse$ Self-attention mechanism, which achieves $O(L \log L)$ in time complexity and memory usage, and has comparable performance on sequences' dependency alignment. (ii) the self-attention distilling highlights dominating attention by halving cascading layer input, and efficiently handles extreme long input sequences. (iii) the generative style decoder, while conceptually simple, predicts the long time-series sequences at one forward operation rather than a step-by-step way, which drastically improves the inference speed of long-sequence predictions. Extensive experiments on four large-scale datasets demonstrate that Informer significantly outperforms existing methods and provides a new solution to the LSTF problem.
翻译:许多现实世界应用要求预测长序列时间序列,例如电力消耗规划。长序列时间序列预测要求模型的预测能力高,即能够高效率地捕捉产出和输入之间准确的长距离依赖性连接。最近的研究表明,变异器有可能提高预测能力。然而,变异器存在若干严重问题,无法直接适用于LSTF,例如四级时间复杂性、高记忆用量和编码脱co器结构的内在限制。为了解决这些问题,我们为LSTF设计了一个高效的变异器模型,名为Inexer,有三种不同的特点:(一) $PROBSparse$自留机制,在时间复杂性和记忆使用方面达到$O(L\log L)美元,在序列依赖性调整方面业绩相当。 (二) 自我蒸馏,通过将层输入减半,并高效地处理极端长的输入序列,从而引起人们的注意。 (三) 变异式变异式变异式风格模式,在一种时间序列上显示一个长期的变异式的变形方法,在概念序列上大大地改进了前变变式的变式的变式的变式,在顺序上,在的变式的顺序上将一个长期的变形式的变形式的变式的变形式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变距。