Image matting refers to predicting the alpha values of unknown foreground areas from natural images. Prior methods have focused on propagating alpha values from known to unknown regions. However, not all natural images have a specifically known foreground. Images of transparent objects, like glass, smoke, web, etc., have less or no known foreground. In this paper, we propose a Transformer-based network, TransMatting, to model transparent objects with a big receptive field. Specifically, we redesign the trimap as three learnable tri-tokens for introducing advanced semantic features into the self-attention mechanism. A small convolutional network is proposed to utilize the global feature and non-background mask to guide the multi-scale feature propagation from encoder to decoder for maintaining the contexture of transparent objects. In addition, we create a high-resolution matting dataset of transparent objects with small known foreground areas. Experiments on several matting benchmarks demonstrate the superiority of our proposed method over the current state-of-the-art methods.
翻译:图像交配是指从自然图像中预测未知前景区域的阿尔法值。 先前的方法侧重于从已知区域向未知区域传播阿尔法值。 然而, 并非所有自然图像都有具体已知的前景。 透明对象( 如玻璃、 烟雾、 网络等) 的图像没有多少或根本没有已知的前景。 在本文中, 我们提出一个基于变异器的网络( TransMatting ), 以一个大可接受域来模拟透明对象。 具体地说, 我们重新设计三角图, 将其重新设计为三个可学习的三步数, 用于将高级语义特性引入自留机制。 一个小型的相形形形形形色色网络, 以利用全球特征和非地表面遮罩来引导从编码器到解码器的多尺度特征传播, 以维护透明对象的背景。 此外, 我们创建了一个高分辨率的透明对象配制数据集, 以已知的小型地面区域。 在几个交配基准上进行实验, 显示我们提出的方法优于当前状态方法。