Generating non-existing frames from a consecutive video sequence has been an interesting and challenging problem in the video processing field. Typical kernel-based interpolation methods predict pixels with a single convolution process that convolves source frames with spatially adaptive local kernels, which circumvents the time-consuming, explicit motion estimation in the form of optical flow. However, when scene motion is larger than the pre-defined kernel size, these methods are prone to yield less plausible results. In addition, they cannot directly generate a frame at an arbitrary temporal position because the learned kernels are tied to the midpoint in time between the input frames. In this paper, we try to solve these problems and propose a novel non-flow kernel-based approach that we refer to as enhanced deformable separable convolution (EDSC) to estimate not only adaptive kernels, but also offsets, masks and biases to make the network obtain information from non-local neighborhood. During the learning process, different intermediate time step can be involved as a control variable by means of an extension of coord-conv trick, allowing the estimated components to vary with different input temporal information. This makes our method capable to produce multiple in-between frames. Furthermore, we investigate the relationships between our method and other typical kernel- and flow-based methods. Experimental results show that our method performs favorably against the state-of-the-art methods across a broad range of datasets. Code will be publicly available on URL: \url{https://github.com/Xianhang/EDSC-pytorch}.
翻译:从连续的视频序列生成非存在的框 { 连续的视频序列一直是一个有趣的、具有挑战性的问题 。 典型的内核内插方法预测了一个单一的演进过程,将源框架与空间适应性本地内核混在一起,绕过光学流这种耗时的、明显的运动估计。 但是,当场运动大于预先定义的内核大小时,这些方法很容易产生不那么可信的结果。 此外,它们无法直接在任意的时间位置生成一个框架,因为所学的内核在输入框架之间的时间中点绑在一起。 在本文件中,我们试图解决这些问题并提出一种新的非流动内核内核法,我们称之为强化的变形分解性内核变(EDSC),以估计不仅适应性内核,而且抵消、遮蔽和偏差,使网络从非本地社区获取信息。在学习过程中,不同的中间时间步骤可以作为控制变量,通过扩展软体- 软体- 软体- 框架之间的中点-, 允许我们使用新的非流动内核流- 流- 系统- 显示我们不同的输入方法。