Video frame interpolation (VFI) aims to generate predictive frames by warping learnable motions from the bidirectional historical references. Most existing works utilize spatio-temporal semantic information extractor to realize motion estimation and interpolation modeling, not enough considering with the real mechanistic rationality of generated middle motions. In this paper, we reformulate VFI as a multi-variable non-linear (MNL) regression problem, and a Joint Non-linear Motion Regression (JNMR) strategy is proposed to model complicated motions of inter-frame. To establish the MNL regression, ConvLSTM is adopted to construct the distribution of complete motions in temporal dimension. The motion correlations between the target frame and multiple reference frames can be regressed by the modeled distribution. Moreover, the feature learning network is designed to optimize for the MNL regression modeling. A coarse-to-fine synthesis enhancement module is further conducted to learn visual dynamics at different resolutions through repetitive regression and interpolation. Highly competitive experimental results on frame interpolation show that the effectiveness and significant improvement compared with state-of-the-art performance, and the robustness of complicated motion estimation is improved by the MNL motion regression.
翻译:视频框架间插(VFI) 旨在通过扭曲双向历史参考文献中可学习到的动作来生成预测框架。大多数现有作品使用spatio-时代静脉信息提取器来实现运动估计和内插模型,而没有足够考虑到生成的中间动作的真正机械理性。在本文中,我们重新将VFI改写为一个多变量的非线性回归(MNL)回归(MNL)问题和联合非线性动力回归(JNMR)战略,以模拟复杂的跨框架动作(JNMR)战略。为建立 MNL回归,采用CONLTM 来构建时间层面完整动作的分布。目标框架和多个参考框架之间的运动相关性可能因模型分布而倒退。此外,功能学习网络的设计是为了优化MNDL回归模型的回归模型。通过重复回归和内插,进一步进行粗微的合成增强模块,以学习不同分辨率的视觉动态。框架内插高度竞争的实验结果显示,与MNBRI的改进性能和动态相比,ML的改进后推力是有效的和大幅改进。