We introduce a robust, real-time, high-resolution human video matting method that achieves new state-of-the-art performance. Our method is much lighter than previous approaches and can process 4K at 76 FPS and HD at 104 FPS on an Nvidia GTX 1080Ti GPU. Unlike most existing methods that perform video matting frame-by-frame as independent images, our method uses a recurrent architecture to exploit temporal information in videos and achieves significant improvements in temporal coherence and matting quality. Furthermore, we propose a novel training strategy that enforces our network on both matting and segmentation objectives. This significantly improves our model's robustness. Our method does not require any auxiliary inputs such as a trimap or a pre-captured background image, so it can be widely applied to existing human matting applications.
翻译:我们引入了一种强大、实时、高分辨率的人类视频配制方法,该方法可以实现新的最新性能。我们的方法比以往的方法轻得多,并且可以在Nvidia GTX 1080Ti GPU上处理76个FPS4K和HD104个FPS4K和HD104个FPS。与大多数现有的将视频配制框架作为独立图像的方法不同,我们的方法使用一个经常性结构来利用视频中的时间信息,并在时间一致性和交配质量方面实现重大改进。此外,我们提出了一个新的培训战略,在交配和分解目标上都执行我们的网络。这极大地提高了模型的稳健性。我们的方法不需要任何辅助投入,例如裁剪或预割背景图像,因此可以广泛应用于现有的人类交配应用。