Video frame interpolation(VFI) is the task that synthesizes the intermediate frame given two consecutive frames. Most of the previous studies have focused on appropriate frame warping operations and refinement modules for the warped frames. These studies have been conducted on natural videos containing only continuous motions. However, many practical videos contain various unnatural objects with discontinuous motions such as logos, user interfaces and subtitles. We propose three techniques to make the existing deep learning-based VFI architectures robust to these elements. First is a novel data augmentation strategy called figure-text mixing(FTM) which can make the models learn discontinuous motions during training stage without any extra dataset. Second, we propose a simple but effective module that predicts a map called discontinuity map(D-map), which densely distinguishes between areas of continuous and discontinuous motions. Lastly, we propose loss functions to give supervisions of the discontinuous motion areas which can be applied along with FTM and D-map. We additionally collect a special test benchmark called Graphical Discontinuous Motion(GDM) dataset consisting of some mobile games and chatting videos. Applied to the various state-of-the-art VFI networks, our method significantly improves the interpolation qualities on the videos from not only GDM dataset, but also the existing benchmarks containing only continuous motions such as Vimeo90K, UCF101, and DAVIS.
翻译:视频框架间插( VFI) 是将中间框架合成为连续两个框架的任务。 以往的研究大多侧重于对扭曲框架进行适当的框架扭曲操作和完善模块。 这些研究是在自然视频中进行的, 仅包含连续动作。 然而, 许多实用视频包含各种非自然对象, 带有不连续动作, 如标志、 用户界面和字幕。 我们提出三种技术, 使现有的深层次学习的 VFI 结构能够对这些元素产生强大的效果。 首先, 是一种叫作图形文本混合( FTM) 的新数据增强战略, 它可以让模型在培训阶段学习不连续动作, 而不会有任何额外的数据集。 其次, 我们提出一个简单而有效的模块, 预测称为不连续动作地图( D- 映射) 的地图( D- 映射 ) 。 然而, 我们提议了损失功能, 以监督不连续运动区域( FTM ) 和 DMPS 。 我们另外收集了一个名为图形不连续的 MoDM( GDDM) 数据设置, 由一些移动游戏和聊天的视频组成。 我们的DFI- DFI- DFI 系统系统, 仅用于各种数据库, 的系统, 。