Recent learning-based inpainting algorithms have achieved compelling results for completing missing regions after removing undesired objects in videos. To maintain the temporal consistency among the frames, 3D spatial and temporal operations are often heavily used in the deep networks. However, these methods usually suffer from memory constraints and can only handle low resolution videos. We propose STRA-Net, a novel spatial-temporal residual aggregation framework for high resolution video inpainting. The key idea is to first learn and apply a spatial and temporal inpainting network on the downsampled low resolution videos. Then, we refine the low resolution results by aggregating the learned spatial and temporal image residuals (details) to the upsampled inpainted frames. Both the quantitative and qualitative evaluations show that we can produce more temporal-coherent and visually appealing results than the state-of-the-art methods on inpainting high resolution videos.
翻译:最近基于学习的绘画算法在删除视频中不受欢迎的对象后,在完成缺失区域方面取得了令人信服的结果。为了保持框架之间的时间一致性,深海网络经常大量使用3D空间和时间操作。然而,这些方法通常会受到记忆限制,只能处理低分辨率视频。我们提议STRA-Net,这是一个用于高分辨率视频绘画的新颖的空间时空剩余汇总框架。关键的想法是首先在低分辨率扫描的低分辨率视频上学习和应用一个空间和时间绘图网络。然后,我们通过将所学的空间和时间图像残存(细节)汇总到上层插图框中来改进低分辨率结果。定量和定性评估都表明,我们可以产生比在绘制高分辨率视频上最先进的方法更符合时序和具有视觉吸引力的结果。