This paper proposes a novel video inpainting method. We make three main contributions: First, we extended previous Transformers with patch alignment by introducing Deformed Patch-based Homography (DePtH), which improves patch-level feature alignments without additional supervision and benefits challenging scenes with various deformation. Second, we introduce Mask Pruning-based Patch Attention (MPPA) to improve patch-wised feature matching by pruning out less essential features and using saliency map. MPPA enhances matching accuracy between warped tokens with invalid pixels. Third, we introduce a Spatial-Temporal weighting Adaptor (STA) module to obtain accurate attention to spatial-temporal tokens under the guidance of the Deformation Factor learned from DePtH, especially for videos with agile motions. Experimental results demonstrate that our method outperforms recent methods qualitatively and quantitatively and achieves a new state-of-the-art.
翻译:本文提出了一个新的视频油漆方法。 我们做出了三大贡献: 首先,我们通过引入不完善的补丁基同族体(DePtH)来扩展以前的变异器,并进行补齐,这样可以改善补丁级特征的匹配,而无需额外的监督,并有利于各种变形的挑战场景。 其次,我们引入了面具普鲁宁派注意(MPPA),通过切除较不重要的特征和使用显眼的地图来改进补丁性特征匹配。 MPPA提高了扭曲的标牌与无效像素的准确性。 第三,我们引入了空间-时权重调调控(STA)模块,以便在从DPTH学的变形要素的指导下,准确关注空间-时空符号,特别是具有灵活动作的视频。 实验结果表明,我们的方法在质量和数量上都超越了最近的方法,并实现了新的状态。