Multivariate time series forecasting has been widely used in various practical scenarios. Recently, Transformer-based models have shown significant potential in forecasting tasks due to the capture of long-range dependencies. However, recent studies in the vision and NLP fields show that the role of attention modules is not clear, which can be replaced by other token aggregation operations. This paper investigates the contributions and deficiencies of attention mechanisms on the performance of time series forecasting. Specifically, we find that (1) attention is not necessary for capturing temporal dependencies, (2) the entanglement and redundancy in the capture of temporal and channel interaction affect the forecasting performance, and (3) it is important to model the mapping between the input and the prediction sequence. To this end, we propose MTS-Mixers, which use two factorized modules to capture temporal and channel dependencies. Experimental results on several real-world datasets show that MTS-Mixers outperform existing Transformer-based models with higher efficiency.
翻译:在各种实际情景中广泛使用了多变时间序列预测。最近,基于变换器的模型显示,由于捕捉了远距离依赖性,预测任务具有巨大潜力。然而,最近对愿景和NLP字段的研究显示,关注模块的作用并不明确,可以由其他象征性汇总操作取代。本文调查时间序列预测性能的注意机制的贡献和不足。具体地说,我们发现(1) 捕捉时间依赖性没有必要引起注意,(2) 获取时间和频道互动的纠缠和冗余影响预测性性,(3) 建模输入和预测序列之间的映射非常重要。为此,我们提议使用两个因子化模块来捕捉时间和通道依赖性。若干现实世界数据集的实验结果表明,MDC-Mixers比现有的变换器模型效率更高。