Video frame interpolation task has recently become more and more prevalent in the computer vision field. At present, a number of researches based on deep learning have achieved great success. Most of them are either based on optical flow information, or interpolation kernel, or a combination of these two methods. However, these methods have ignored that there are grid restrictions on the position of kernel region during synthesizing each target pixel. These limitations result in that they cannot well adapt to the irregularity of object shape and uncertainty of motion, which may lead to irrelevant reference pixels used for interpolation. In order to solve this problem, we revisit the deformable convolution for video interpolation, which can break the fixed grid restrictions on the kernel region, making the distribution of reference points more suitable for the shape of the object, and thus warp a more accurate interpolation frame. Experiments are conducted on four datasets to demonstrate the superior performance of the proposed model in comparison to the state-of-the-art alternatives.
翻译:在计算机视觉领域,视频框架的内插任务最近越来越普遍。目前,一些基于深层学习的研究取得了巨大成功,其中多数基于光学流信息或内核,或这两种方法的结合。然而,这些方法忽视了在对每个目标像素进行合成时对内核区域位置的网格限制。这些限制导致它们无法很好地适应物体形状的不规则性和运动的不确定性,这可能导致用于内插的参考像素不相干。为了解决这一问题,我们重新审视了可变化的视频内插演,这可以打破对内核区域的固定网格限制,使参照点的分配更适合物体的形状,从而扭曲一个更精确的内插框架。在四个数据集上进行了实验,以显示拟议模型与最新替代方法相比的优异性表现。