非静止变异器:探索时间序列预测中的稳定性 (Non-stationary Transformers: Exploring the Stationarity in Time Series Forecasting)

Transformers have shown great power in time series forecasting due to their global-range modeling ability. However, their performance can degenerate terribly on non-stationary real-world data in which the joint distribution changes over time. Previous studies primarily adopt stationarization to attenuate the non-stationarity of original series for better predictability. But the stationarized series deprived of inherent non-stationarity can be less instructive for real-world bursty events forecasting. This problem, termed over-stationarization in this paper, leads Transformers to generate indistinguishable temporal attentions for different series and impedes the predictive capability of deep models. To tackle the dilemma between series predictability and model capability, we propose Non-stationary Transformers as a generic framework with two interdependent modules: Series Stationarization and De-stationary Attention. Concretely, Series Stationarization unifies the statistics of each input and converts the output with restored statistics for better predictability. To address the over-stationarization problem, De-stationary Attention is devised to recover the intrinsic non-stationary information into temporal dependencies by approximating distinguishable attentions learned from raw series. Our Non-stationary Transformers framework consistently boosts mainstream Transformers by a large margin, which reduces MSE by 49.43% on Transformer, 47.34% on Informer, and 46.89% on Reformer, making them the state-of-the-art in time series forecasting. Code is available at this repository: https://github.com/thuml/Nonstationary_Transformers.

翻译：在时间序列预测中,由于全球范围的建模能力,变异器在时间序列预测中表现出巨大的力量。然而,它们的性能在非静止真实世界数据中会急剧退化,因为随着时间推移,联合分布会发生变化。以往的研究主要采用固定化,以降低原始序列的不静止性,提高可预测性。但是,没有固有非静止的变异序列对于真实世界的突发事件预测可能不太具有启发性。这个问题,在本文中被称为超静止化,导致变异器对不同序列产生无法区分的正常时间关注,并妨碍深层模型的预测能力。要解决系列可预测性和模型能力之间的两难问题,我们建议非静止变异变变器是一个通用框架,有两个相互依存的模块:系列变异和变异注意。具体来说,Series Statar化系统对每项输入的统计进行整合,并用恢复性统计来提高可预测性。为了解决过常化问题,变异式注意的目的是通过适应性变异性的时间序列恢复内在的非固定性信息,从而阻碍深层模型的预测能力。我们建议非静止变现式变异性变变变式变式变式的变式变式变式变式变式系统,通过原始变式的变式变式变式变式的系统在原始变式模型中不断变式的变式模型中不断变式的变式的变式变式变式的变式的变式的变式的变式式的变式框架来减少。