Spatiotemporal predictive learning aims to generate future frames by learning from historical frames. In this paper, we investigate existing methods and present a general framework of spatiotemporal predictive learning, in which the spatial encoder and decoder capture intra-frame features and the middle temporal module catches inter-frame correlations. While the mainstream methods employ recurrent units to capture long-term temporal dependencies, they suffer from low computational efficiency due to their unparallelizable architectures. To parallelize the temporal module, we propose the Temporal Attention Unit (TAU), which decomposes the temporal attention into intra-frame statical attention and inter-frame dynamical attention. Moreover, while the mean squared error loss focuses on intra-frame errors, we introduce a novel differential divergence regularization to take inter-frame variations into account. Extensive experiments demonstrate that the proposed method enables the derived model to achieve competitive performance on various spatiotemporal prediction benchmarks.
翻译:在本文中,我们研究了现有方法,并提出了一个时空预测学习的总体框架,其中空间编码器和解码器捕捉了机体内部特征和中时际模块捕捉了机体间关联。虽然主流方法使用经常性单位来捕捉长期时间依赖性,但由于它们无法对齐的建筑结构,它们受到低计算效率的影响。要同时利用时间模块,我们建议时空注意股(TAU)将时间注意力分解为机体内静态关注和机体间动态关注。此外,虽然平均平方差损失侧重于机体内误差,但我们引入了新的差异调节,以考虑到机体间差异。广泛的实验表明,拟议的方法使衍生模型能够在各种时空预测基准上取得竞争性业绩。