Time series forecasting is widely used in the fields of equipment life cycle forecasting, weather forecasting, traffic flow forecasting, and other fields. Recently, some scholars have tried to apply Transformer to time series forecasting because of its powerful parallel training ability. However, the existing Transformer methods do not pay enough attention to the small time segments that play a decisive role in prediction, making it insensitive to small changes that affect the trend of time series, and it is difficult to effectively learn continuous time-dependent features. To solve this problem, we propose a differential attention fusion model based on Transformer, which designs the differential layer, neighbor attention, sliding fusion mechanism, and residual layer on the basis of classical Transformer architecture. Specifically, the differences of adjacent time points are extracted and focused by difference and neighbor attention. The sliding fusion mechanism fuses various features of each time point so that the data can participate in encoding and decoding without losing important information. The residual layer including convolution and LSTM further learns the dependence between time points and enables our model to carry out deeper training. A large number of experiments on three datasets show that the prediction results produced by our method are favorably comparable to the state-of-the-art.
翻译:时间序列预测被广泛用于设备生命周期预测、天气预报、交通流量预测和其他领域。最近,一些学者试图将变异器应用于时间序列预测,因为其具有强大的平行培训能力。然而,现有的变异器方法没有足够重视在预测中起决定性作用的小时间段,使得它对影响时间序列趋势的小变化不敏感,也难以有效地了解持续的时间依赖特征。为了解决这个问题,我们提议了一种基于变异器的不同关注聚变模型,该模型根据古典变异器结构设计差异层、邻里注意、滑动聚变机制和剩余层。具体地说,相邻时间点的差异是用差异和邻人注意的。滑动聚变机制将每个时间点的不同特征结合在一起,以便数据能够参与编码和解码,而不会丢失重要信息。包括变异和LSTM在内的残余层进一步了解时间点之间的依赖性,并使我们的模型能够进行更深入的培训。在三个数据集上进行的大量实验表明,我们的方法所产生的预测结果与状态相当。