Deep learning-based video inpainting has yielded promising results and gained increasing attention from researchers. Generally, these methods usually assume that the corrupted region masks of each frame are known and easily obtained. However, the annotation of these masks are labor-intensive and expensive, which limits the practical application of current methods. Therefore, we expect to relax this assumption by defining a new semi-supervised inpainting setting, making the networks have the ability of completing the corrupted regions of the whole video using the annotated mask of only one frame. Specifically, in this work, we propose an end-to-end trainable framework consisting of completion network and mask prediction network, which are designed to generate corrupted contents of the current frame using the known mask and decide the regions to be filled of the next frame, respectively. Besides, we introduce a cycle consistency loss to regularize the training parameters of these two networks. In this way, the completion network and the mask prediction network can constrain each other, and hence the overall performance of the trained model can be maximized. Furthermore, due to the natural existence of prior knowledge (e.g., corrupted contents and clear borders), current video inpainting datasets are not suitable in the context of semi-supervised video inpainting. Thus, we create a new dataset by simulating the corrupted video of real-world scenarios. Extensive experimental results are reported to demonstrate the superiority of our model in the video inpainting task. Remarkably, although our model is trained in a semi-supervised manner, it can achieve comparable performance as fully-supervised methods.
翻译:深层的基于学习的视频绘画产生了可喜的成果,并得到了研究人员越来越多的关注。一般而言,这些方法通常假定每个框架的腐败区域面具是已知和容易获得的。然而,这些面具的注解是劳动密集型和昂贵的,这限制了当前方法的实际应用。因此,我们期望通过界定一个新的半监督的绘画设置来放松这一假设,使网络能够使用仅一个框架的附加说明的面具完成整个视频的腐败区域。具体地说,在这项工作中,我们提议了一个端到端的训练框架,包括完成网络和掩盖预测网络,目的是利用已知的面具生成当前框架的腐败内容,并决定要填充下一个框架的区域。此外,我们还希望通过界定一个半监督的油漆设置,使完成网络和遮罩预测网络能够相互制约,从而能够最大限度地实现经过培训的模型的整体性业绩。此外,由于先前知识的自然存在(例如,腐败内容和隐藏在经过培训的深度的图像中),我们目前通过一个可比较的图像绘制的图像的图像制作结果,因此,我们现在的图像制作的图像制作过程将完全地展示为一种可比较的高级的图像。通过在我们的图像制作中,通过在真实的图像中,通过在真实的图像中可以实现。在真实的图像中,通过一种可比较的图像制作中,可以实现。