Video frame interpolation aims to synthesize one or multiple frames between two consecutive frames in a video. It has a wide range of applications including slow-motion video generation, frame-rate up-scaling and developing video codecs. Some older works tackled this problem by assuming per-pixel linear motion between video frames. However, objects often follow a non-linear motion pattern in the real domain and some recent methods attempt to model per-pixel motion by non-linear models (e.g., quadratic). A quadratic model can also be inaccurate, especially in the case of motion discontinuities over time (i.e. sudden jerks) and occlusions, where some of the flow information may be invalid or inaccurate. In our paper, we propose to approximate the per-pixel motion using a space-time convolution network that is able to adaptively select the motion model to be used. Specifically, we are able to softly switch between a linear and a quadratic model. Towards this end, we use an end-to-end 3D CNN encoder-decoder architecture over bidirectional optical flows and occlusion maps to estimate the non-linear motion model of each pixel. Further, a motion refinement module is employed to refine the non-linear motion and the interpolated frames are estimated by a simple warping of the neighboring frames with the estimated per-pixel motion. Through a set of comprehensive experiments, we validate the effectiveness of our model and show that our method outperforms state-of-the-art algorithms on four datasets (Vimeo, DAVIS, HD and GoPro).
翻译:视频框架内插图的目的是在视频中合成一个或多个连续两个框架之间的一个或多个框架。 它具有广泛的应用范围, 包括慢动视频生成、 框架节率升降和开发视频编码器。 一些较老的作品通过假设视频框架之间的每像素线性运动来解决这个问题。 然而, 对象往往在真实域中遵循非线性运动模式, 以及最近一些尝试用非线性模型( 如, 二次曲线) 来模拟每像素运动的方法。 一个四面形模型也可能不准确, 特别是时间运动不连续( 即, 突变的混蛋) 和闭塞等应用。 在我们的论文中, 我们提议使用一个空间- 时间变动网络来近似每像运动模式。 具体地, 我们可以在线性模型和跨线性模型之间进行软化转换。 至此端, 我们使用一个最终到端的 3DCN 的 快速的轨迹值计算模型框架, 以及某些流动信息信息可能是无效或不准确的流信息。 在双向模型中, 演示图中, 的每个双向方向的滚动的滚动图显示, 方向的滚动图和下, 更深的滚动图图显示。