Most matting researches resort to advanced semantics to achieve high-quality alpha mattes, and direct low-level features combination is usually explored to complement alpha details. However, we argue that appearance-agnostic integration can only provide biased foreground details and alpha mattes require different-level feature aggregation for better pixel-wise opacity perception. In this paper, we propose an end-to-end Hierarchical and Progressive Attention Matting Network (HAttMatting++), which can better predict the opacity of the foreground from single RGB images without additional input. Specifically, we utilize channel-wise attention to distill pyramidal features and employ spatial attention at different levels to filter appearance cues. This progressive attention mechanism can estimate alpha mattes from adaptive semantics and semantics-indicated boundaries. We also introduce a hybrid loss function fusing Structural SIMilarity (SSIM), Mean Square Error (MSE), Adversarial loss, and sentry supervision to guide the network to further improve the overall foreground structure. Besides, we construct a large-scale and challenging image matting dataset comprised of 59, 600 training images and 1000 test images (a total of 646 distinct foreground alpha mattes), which can further improve the robustness of our hierarchical and progressive aggregation model. Extensive experiments demonstrate that the proposed HAttMatting++ can capture sophisticated foreground structures and achieve state-of-the-art performance with single RGB images as input.
翻译:在本文中,我们建议采用先进的语义学来达到高质量的阿尔法配方,并且通常会探索直接的低级特征组合来补充阿尔法细节。然而,我们争辩说,外观-不可知性整合只能提供偏向的表面细节,而阿尔法配方则需要不同层次的特征聚合,以便更好的像素与不透明感知。在本文中,我们建议采用一个端到端的高度和渐进注意网(HattMatMatting++),它可以在不增加投入的情况下更好地预测单一 RGB 图像的表面不透明性。具体地说,我们利用频道的注意力来蒸馏金字塔特征,在不同级别上利用空间上的注意力来过滤外观提示。这个渐进式关注机制可以从适应的语义学和语义化的边界中估算字母的配方。我们还引入一个混合损失函数,以结构SMILI(SSIM) 、 平方错误(MSE) 、 反向损耗、 隐蔽监督以指导网络进一步改进整个地面结构。此外,我们还构建了一个大规模和具有挑战性的图像级结构结构结构的模型-级结构结构图案,可以用来测试五十九级的AMAL(B) 的模型和直地层图层图层图的模型,可以进一步改进。