We propose an efficient design of Transformer-based models for multivariate time series forecasting and self-supervised representation learning. It is based on two key components: (i) segmentation of time series into subseries-level patches which are served as input tokens to Transformer; (ii) channel-independence where each channel contains a single univariate time series that shares the same embedding and Transformer weights across all the series. Patching design naturally has three-fold benefit: local semantic information is retained in the embedding; computation and memory usage of the attention maps are quadratically reduced given the same look-back window; and the model can attend longer history. Our channel-independent patch time series Transformer (PatchTST) can improve the long-term forecasting accuracy significantly when compared with that of SOTA Transformer-based models. We also apply our model to self-supervised pre-training tasks and attain excellent fine-tuning performance, which outperforms supervised training on large datasets. Transferring of masked pre-trained representation on one dataset to others also produces SOTA forecasting accuracy. Code is available at: https://github.com/yuqinie98/PatchTST.
翻译:我们建议高效设计基于变换器的多变时间序列预测和自我监督的演示学习模型,它基于两个关键组成部分:(一) 时间序列分解成子序列级补丁,作为变换器的输入符号;(二) 频道独立,每个频道都有一个单一的单向时间序列,在所有系列中共享相同的嵌入和变换器重量。补丁设计自然具有三重好处:在嵌入器中保留当地语义信息;在相同的反向窗口中,关注地图的计算和记忆用量是二次递减的;模型可以记录更长的历史。我们依赖频道的补丁时间序列变换器(PatchTST)可以显著提高长期预测准确性,如果与SOTA变换器模型相比。我们还将我们的模型应用于自我监督的预培训任务,并实现优异的微调性性能,这比大型数据集的监督培训要好。将一个数据集的蒙面的预调前代表器转让给其他数据集,也生成SOTATA98/STQrence。</s>