The predictive learning of spatiotemporal sequences aims to generate future images by learning from the historical context, where the visual dynamics are believed to have modular structures that can be learned with compositional subsystems. This paper models these structures by presenting PredRNN, a new recurrent network, in which a pair of memory cells are explicitly decoupled, operate in nearly independent transition manners, and finally form unified representations of the complex environment. Concretely, besides the original memory cell of LSTM, this network is featured by a zigzag memory flow that propagates in both bottom-up and top-down directions across all layers, enabling the learned visual dynamics at different levels of RNNs to communicate. It also leverages a memory decoupling loss to keep the memory cells from learning redundant features. We further improve PredRNN with a new curriculum learning strategy, which can be generalized to most sequence-to-sequence RNNs in predictive learning scenarios. We provide detailed ablation studies, gradient analyses, and visualizations to verify the effectiveness of each component. We show that our approach obtains highly competitive results on three standard datasets: the synthetic Moving MNIST dataset, the KTH human action dataset, and a radar echo dataset for precipitation forecasting.
翻译:预测性地瞬间序列的学习旨在通过从历史背景中学习来生成未来图像,据认为视觉动态具有模块结构,可以与组成子系统一起学习。本文通过展示PredRNN这个新的经常性网络来模拟这些结构。PredRNN是一个新的经常性网络,在这个网络中,一对记忆细胞被明确分离,以几乎独立的过渡方式运作,最后形成对复杂环境的统一表述。具体地说,除了LSTM的原始记忆细胞外,这个网络还以一个Zigzag记忆流为特征,该记忆流在每层的自下和自上而下方向上传播,使各个层次的RNNNS能够进行交流,使不同层次的学习视觉动态得以学习。它也利用一个记忆分解性损失来防止记忆细胞学习多余的特性。我们进一步改进PredRNNNNN,采用新的课程学习战略,在预测性学习情景中可以普遍采用大多数顺序到顺序的RNNSNS。我们提供了详细的反向研究、梯度分析以及直观的记忆流,以核实每个组成部分的有效性。我们的方法在三种标准数据中获得了高度竞争性的结果。我们展示了三种标准数据预报数据,即合成数据,即移动数据。