To address the challenging portrait video matting problem more precisely, existing works typically apply some matting priors that require additional user efforts to obtain, such as annotated trimaps or background images. In this work, we observe that instead of asking the user to explicitly provide a background image, we may recover it from the input video itself. To this end, we first propose a novel background restoration module (BRM) to recover the background image dynamically from the input video. BRM is extremely lightweight and can be easily integrated into existing matting models. By combining BRM with a recent image matting model, MODNet, we then present MODNet-V for portrait video matting. Benefited from the strong background prior provided by BRM, MODNet-V has only 1/3 of the parameters of MODNet but achieves comparable or even better performances. Our design allows MODNet-V to be trained in an end-to-end manner on a single NVIDIA 3090 GPU. Finally, we introduce a new patch refinement module (PRM) to adapt MODNet-V for high-resolution videos while keeping MODNet-V lightweight and fast.
翻译:为了更准确地解决具有挑战性的肖像视频交配问题,现有作品通常会应用一些需要更多用户努力才能获取的交配前科,例如附加说明的三角图或背景图像。 在这项工作中,我们观察到,我们不是要求用户明确提供背景图像,而是从输入视频本身中收回。为此,我们首先提出一个新的背景恢复模块(BRM),以便从输入视频中动态地恢复背景图像。BRM非常轻,很容易融入现有的交配模型。通过将BRM与最近的图像交配模型(MODNet)相结合,我们然后将MODNet-V用于肖像性视频交配。从BRM之前提供的强势背景中获益,MODNet-V只有MODNet参数的三分之一,但取得类似或更好的性能。我们的设计允许MODNet-V在保持MODNet-V光量和快速光量的同时,对单一的NVIDIA 3090 GPU进行端到端培训。最后,我们引入一个新的补补制模块(PRM),以适应MODNet-V的高分辨率视频。