The performance of video prediction has been greatly boosted by advanced deep neural networks. However, most of the current methods suffer from large model sizes and require extra inputs, e.g., semantic/depth maps, for promising performance. For efficiency consideration, in this paper, we propose a Dynamic Multi-scale Voxel Flow Network (DMVFN) to achieve better video prediction performance at lower computational costs with only RGB images, than previous methods. The core of our DMVFN is a differentiable routing module that can effectively perceive the motion scales of video frames. Once trained, our DMVFN selects adaptive sub-networks for different inputs at the inference stage. Experiments on several benchmarks demonstrate that our DMVFN is an order of magnitude faster than Deep Voxel Flow and surpasses the state-of-the-art iterative-based OPT on generated image quality. Our code and demo are available at https://huxiaotaostasy.github.io/DMVFN/.
翻译:深度神经网络极大地提高了视频预测性能。但是,目前大多数方法都存在模型大小大、需要额外的输入(例如,语义/深度图)等问题,用以实现良好的性能。为了考虑效率,本文提出了一种动态多尺度体素流网络(Dynamic Multi-Scale Voxel Flow Network, DMVFN),通过仅使用RGB图像实现更低的计算成本,从而实现更好的视频预测性能,比之前的方法要快得多。DMVFN的核心是可微分的路由模块,可以有效地感知视频帧的运动尺度。一旦训练完成,DMVFN就可以在推断阶段为不同的输入选择自适应的子网络。在几个基准测试中,我们的DMVFN比Deep Voxel Flow快一个数量级,并超越了基于迭代的OPT的生成图像质量的最新技术。我们提供代码和演示,请访问https://huxiaotaostasy.github.io/DMVFN/.