Recent studies made great progress in video matting by extending the success of trimap-based image matting to the video domain. In this paper, we push this task toward a more practical setting and propose One-Trimap Video Matting network (OTVM) that performs video matting robustly using only one user-annotated trimap. A key of OTVM is the joint modeling of trimap propagation and alpha prediction. Starting from baseline trimap propagation and alpha prediction networks, our OTVM combines the two networks with an alpha-trimap refinement module to facilitate information flow. We also present an end-to-end training strategy to take full advantage of the joint model. Our joint modeling greatly improves the temporal stability of trimap propagation compared to the previous decoupled methods. We evaluate our model on two latest video matting benchmarks, Deep Video Matting and VideoMatting108, and outperform state-of-the-art by significant margins (MSE improvements of 56.4% and 56.7%, respectively). The source code and model are available online: https://github.com/Hongje/OTVM.
翻译:最近的研究在视频交配方面取得了巨大进展,通过将基于三角图的图像交配成功推广到视频域。 在本文中,我们将这项任务推向更实际的设置,并提议一个只使用一个用户附加说明的三角图的一连串视频交配网(OTVM),只使用一个用户附加说明的三角图。OTVM的关键是三角图传播和阿尔法预测的联合模型。从基底三角图传播和阿尔法预测网络开始,我们的OTVM将这两个网络与一个字母-三角图改进模块结合起来,以便利信息流动。我们还提出了一个端对端培训战略,以充分利用联合模型。我们的联合模型大大改善了三角图传播的时间稳定性,与以前的分解方法相比。我们用两个最新的视频交配基准来评估我们的模型:深视频交配和视频Matting108,以及以显著的利润率超越了状态(MSE改进了56.4%和56.7%)。 源代码和模型可以在线查阅: https://github.com/Hong/OTV。