Predictive learning uses a known state to generate a future state over a period of time. It is a challenging task to predict spatiotemporal sequence because the spatiotemporal sequence varies both in time and space. The mainstream method is to model spatial and temporal structures at the same time using RNN-based or transformer-based architecture, and then generates future data by using learned experience in the way of auto-regressive. The method of learning spatial and temporal features simultaneously brings a lot of parameters to the model, which makes the model difficult to be convergent. In this paper, a modular design is proposed, which decomposes spatiotemporal sequence model into two modules: a spatial encoder-decoder and a predictor. These two modules can extract spatial features and predict future data respectively. The spatial encoder-decoder maps the data into a latent embedding space and generates data from the latent space while the predictor forecasts future embedding from past. By applying the design to the current research and performing experiments on KTH-Action and MovingMNIST datasets, we both improve computational performance and obtain state-of-the-art results.
翻译:预测性学习使用已知状态在一段时间内生成未来状态。 预测时空序列是一项具有挑战性的任务, 因为时空空间序列在时间和空间上各不相同。 主流方法是同时使用基于 RNN 或变压器的架构进行空间和时间结构建模, 然后通过使用自动递减的学习经验生成未来数据。 学习空间和时空特征的方法同时为模型带来许多参数, 这使得模型难以聚合。 本文提出了模块设计, 将空间时空序列模型分解成两个模块: 空间编码- 解码器和预测器。 这两个模块可以分别提取空间特征并预测未来数据。 空间编码解码器将数据映射到一个潜在的嵌入空间, 并生成来自潜在空间的数据, 而预测器预测器则预测未来从过去嵌入。 通过将设计应用于当前对 KTH- 动作和移动MNIST数据集的研究并进行实验, 我们既改进计算性能, 也获取状态- 艺术结果 。