Video interpolation is the task that synthesizes the intermediate frame given two consecutive frames. Most of the previous studies have focused on appropriate frame warping operations and refinement modules for the warped frames. These studies have been conducted on natural videos having only continuous motions. However, many practical videos contain a lot of discontinuous motions, such as chat windows, watermarks, GUI elements, or subtitles. We propose three techniques to expand the concept of transition between two consecutive frames to address these issues. First is a new architecture that can separate continuous and discontinuous motion areas. We also propose a novel data augmentation strategy called figure-text mixing (FTM) to make our model learn more general scenarios. Finally, we propose loss functions to give supervisions of the discontinuous motion areas with the data augmentation. We collected a special dataset consisting of some mobile games and chatting videos. We show that our method significantly improves the interpolation qualities of the videos on the special dataset. Moreover, our model outperforms the state-of-the-art methods for natural video datasets containing only continuous motions, such as DAVIS and UCF101.
翻译:连续两个框架的中间框架的合成任务为视频内插。 以往的研究大多侧重于适当的框架扭曲操作和扭曲框架的精细模块。 这些研究是在自然视频中进行的,只有连续的运动。 然而,许多实用视频包含许多不连续动作, 如聊天窗口、水印、 GUI 元素或字幕。 我们建议了三种技术来扩大两个连续框架之间的过渡概念,以解决这些问题。 首先是一个可以将连续和不连续的运动区域分开的新结构。 我们还提出了一个新的数据增强战略,称为图文混合(FTM), 以使模型学习更一般的场景。 最后, 我们提议了损失功能, 以对数据增强的不连续运动区域进行监督。 我们收集了一个由一些移动游戏和聊天视频组成的特殊数据集。 我们显示, 我们的方法大大改进了特殊数据集上视频的内插图质。 此外, 我们的模型超越了只包含连续动作的自然视频数据集( 如 DAVIS 和 UCF101) 的状态技术方法。