Deep image matting methods have achieved increasingly better results on benchmarks (e.g., Composition-1k/alphamatting.com). However, the robustness, including robustness to trimaps and generalization to images from different domains, is still under-explored. Although some works propose to either refine the trimaps or adapt the algorithms to real-world images via extra data augmentation, none of them has taken both into consideration, not to mention the significant performance deterioration on benchmarks while using those data augmentation. To fill this gap, we propose an image matting method which achieves higher robustness (RMat) via multilevel context assembling and strong data augmentation targeting matting. Specifically, we first build a strong matting framework by modeling ample global information with transformer blocks in the encoder, and focusing on details in combination with convolution layers as well as a low-level feature assembling attention block in the decoder. Then, based on this strong baseline, we analyze current data augmentation and explore simple but effective strong data augmentation to boost the baseline model and contribute a more generalizable matting method. Compared with previous methods, the proposed method not only achieves state-of-the-art results on the Composition-1k benchmark (11% improvement on SAD and 27% improvement on Grad) with smaller model size, but also shows more robust generalization results on other benchmarks, on real-world images, and also on varying coarse-to-fine trimaps with our extensive experiments.
翻译:虽然有些工作提议通过增加额外数据来改进三角图或将算法与真实世界图像相适应,但没有一项同时考虑,更不用说在使用数据扩充时基准业绩严重恶化。为了填补这一空白,我们提议了一种广泛图像调整方法,通过多层次环境组装和强大的数据扩充配对,实现更强的稳健性(RMat),包括三角图的稳健性和对不同域图像的概括性,但这种稳健性,包括三角图和对不同域图像的概括性,仍然在探索中。虽然有些工作提议改进三角图或将算算法与通过额外数据扩增使真实世界图像相适应,但没有将两者都考虑进去,更不用说在使用这些数据扩增数据时基准的显著性恶化。为了填补这一空白,我们提议了一种通过多层次环境组装配和强大的数据扩增数据(RMat)实现更强的稳健性(RMMAT ) 。具体来说,我们首先通过在编码中建模充大量全球信息,然后将拟议的方法(ADGRA) 改进其他基准,然后进行更精确的改进。