Spatiotemporal predictive learning is to predict future frame changes through historical prior knowledge. Previous work improves the performance by making the network wider and deeper, but that also brings huge memory overhead, which seriously hinders the development and application of the technology. Scale is another dimension to improve model performance in common computer vision tasks, which can decrease the computing requirements and better sense context. Such an important dimension has not been considered and explored by recent RNN models. In this paper, learning from the benefit of multi-scale, we propose a general framework named Multi-Scale RNN (MS-RNN) to boost recent RNN models. We verify the MS-RNN framework by exhaustive experiments with 6 popular RNN models (ConvLSTM, TrajGRU, PredRNN, PredRNN++, MIM, and MotionRNN) on 4 different datasets (Moving MNIST, KTH, TaxiBJ, and HKO-7). The results show the efficiency that the RNN models incorporating our framework have much lower memory cost but better performance than before. Our code is released at \url{https://github.com/mazhf/MS-RNN}.
翻译:以往的工作通过扩大和深化网络,改进了业绩,但也带来了巨大的记忆管理,从而严重阻碍了技术的开发和应用。规模是提高共同计算机愿景任务模型性能的另一个方面,这可以降低计算要求和更好的理解环境。最近的RNN模型尚未考虑和探讨这样一个重要方面。在本文件中,从多尺度中受益,我们提议了一个名为多级RNN(MS-RNN)的一般框架,以提升最近的RNN模型。我们通过对6个流行的RN模型(ConvLSTM、TrajGRU、PredRNNN、PredNNN++、MIM和MtionRNNN)进行彻底试验,对MS-RN框架进行核查,在4个不同的数据集(MNIST、KTH、THABBJ和HKO-7)上进行彻底试验。结果显示,纳入我们框架的RNNN模型的记忆成本比以前低得多,但性能更好。我们的代码在urlas/GINBMS.M/mazhcom)上发布。