Spatiotemporal predictive learning is to predict future frames changes through historical prior knowledge. Previous work improves prediction performance by making the network wider and deeper, but this also brings huge memory overhead, which seriously hinders the development and application of the technology. Scale is another dimension to improve model performance in common computer vision task, which can decrease the computing requirements and better sense of context. Such an important improvement point has not been considered and explored by recent RNN models. In this paper, learning from the benefit of multi-scale, we propose a general framework named Multi-Scale RNN (MS-RNN) to boost recent RNN models. We verify the MS-RNN framework by exhaustive experiments on 4 different datasets (Moving MNIST, KTH, TaxiBJ, and HKO-7) and multiple popular RNN models (ConvLSTM, TrajGRU, PredRNN, PredRNN++, MIM, and MotionRNN). The results show the efficiency that the RNN models incorporating our framework have much lower memory cost but better performance than before. Our code is released at \url{https://github.com/mazhf/MS-RNN}.
翻译:以往的工作通过扩大和深化网络,提高了预测绩效,但也带来了巨大的记忆管理费用,这严重阻碍了技术的开发和应用。规模是提高共同计算机愿景任务模型性能的另一个方面,这可以降低计算要求和更好的背景感。最近的RNN模型尚未考虑和探索这样一个重要的改进点。在本文件中,从多尺度的好处中学习,我们提议了一个名为多规模RNN(MS-RNN)的一般框架,以促进最近的RNN模型。我们通过对4个不同数据集(移动MNIST、KTH、THasBJ和HKO-7)和多个流行RNN模型(ConvLSTM、TrajGRRRU、PredRNNN、PredRNN++、MIM和MtionRNNN)的详尽实验,来核查MS-RNN框架。我们的代码发布在urlhttps://gith/mazh。