For human matting without the green screen, existing works either require auxiliary inputs that are costly to obtain or use multiple models that are computationally expensive. Consequently, they are unavailable in real-time applications. In contrast, we present a light-weight matting objective decomposition network (MODNet), which can process human matting from a single input image in real time. The design of MODNet benefits from optimizing a series of correlated sub-objectives simultaneously via explicit constraints. Moreover, since trimap-free methods usually suffer from the domain shift problem in practice, we introduce (1) a self-supervised strategy based on sub-objectives consistency to adapt MODNet to real-world data and (2) a one-frame delay trick to smooth the results when applying MODNet to video human matting. MODNet is easy to be trained in an end-to-end style. It is much faster than contemporaneous matting methods and runs at 63 frames per second. On a carefully designed human matting benchmark newly proposed in this work, MODNet greatly outperforms prior trimap-free methods. More importantly, our method achieves remarkable results in daily photos and videos. Now, do you really need a green screen for real-time human matting?
翻译:对于没有绿色屏幕的人类交配而言,现有作品要么需要成本昂贵的辅助投入,以获得或使用计算成本昂贵的多种模型。因此,在实时应用程序中无法找到这些模型。相比之下,我们展示了一个轻量的交配客观分解网络(MODNet),它可以实时从单一的输入图像中处理人类交配。MODNet的设计通过通过明确限制同时优化一系列相关次级目标而受益。此外,由于无三部位互换方法通常会因实践中的域转移问题而受到影响,因此我们引入了(1) 基于子目标一致性的自监督战略,使MODNet适应真实世界数据;(2) 在应用 MODNet 视频人类交配时,我们展示了一个一框架延迟的把戏来平滑动结果。 MODNet很容易在端对端模式上接受培训。 比同时交配方法快得多,每秒运行63个框架。 在这项工作中新提出的一个精心设计的人类交配基准上,MODNet大大超越了先前的无轨方法。更重要的是,我们的方法在每天的照片和视频中实现了令人瞩目的成果。现在。