Video frame interpolation aims at synthesizing intermediate frames from nearby source frames while maintaining spatial and temporal consistencies. The existing deep-learning-based video frame interpolation methods can be roughly divided into two categories: flow-based methods and kernel-based methods. The performance of flow-based methods is often jeopardized by the inaccuracy of flow map estimation due to oversimplified motion models, while that of kernel-based methods tends to be constrained by the rigidity of kernel shape. To address these performance-limiting issues, a novel mechanism named generalized deformable convolution is proposed, which can effectively learn motion information in a data-driven manner and freely select sampling points in space-time. We further develop a new video frame interpolation method based on this mechanism. Our extensive experiments demonstrate that the new method performs favorably against the state-of-the-art, especially when dealing with complex motions.
翻译:现有的深学习视频框架内插方法可以大致分为两类:流动法和内核法。流基方法的性能常常由于移动模型过于简单化导致流量图估计不准确而受到损害,而内核法的性能往往受到内核形状僵硬的制约。为了解决这些性能限制问题,提议了一个名为普遍变形的新机制,它能够有效地以数据驱动的方式学习运动信息,并在空间时自由地选择取样点。我们进一步根据这一机制制定新的视频框架内插法。我们的广泛实验表明,新的方法在适应状态方面表现优劣,特别是在处理复杂动作时。