引用图像设置 (Referring Image Matting)

Image matting refers to extracting the accurate foregrounds in the image. Current automatic methods tend to extract all the salient objects in the image indiscriminately. In this paper, we propose a new task named Referring Image Matting (RIM), referring to extracting the meticulous alpha matte of the specific object that can best match the given natural language description. However, prevalent visual grounding methods are all limited to the segmentation level, probably due to the lack of high-quality datasets for RIM. To fill the gap, we establish the first large-scale challenging dataset RefMatte by designing a comprehensive image composition and expression generation engine to produce synthetic images on top of current public high-quality matting foregrounds with flexible logics and re-labelled diverse attributes. RefMatte consists of 230 object categories, 47,500 images, 118,749 expression-region entities, and 474,996 expressions, which can be further extended easily in the future. Besides this, we also construct a real-world test set with manually generated phrase annotations consisting of 100 natural images to further evaluate the generalization of RIM models. We first define the task of RIM in two settings, i.e., prompt-based and expression-based, and then benchmark several representative methods together with specific model designs for image matting. The results provide empirical insights into the limitations of existing methods as well as possible solutions. We believe the new task RIM along with the RefMatte dataset will open new research directions in this area and facilitate future studies. The dataset and code will be made publicly available at https://github.com/JizhiziLi/RIM.

翻译：图像交配是指在图像中提取准确的前方。当前自动方法往往会不加区别地提取图像中的所有突出对象。在本文中, 我们提议一个新的任务, 名为参考图像交配( RIM), 以提取最符合给定自然语言描述的具体对象的精细阿尔法配方。但是, 普通的视觉基底方法都局限于分层层面, 可能是因为 RIM 缺少高质量的数据集。为了填补空白, 我们建立第一个具有挑战性的大型数据集 RefMatte, 设计一个全面的图像组成和表达式生成引擎, 以在当前公共高质量配制前方的顶端生成合成图像, 使用灵活的逻辑和重新标签的不同属性。 RefMatte 包含230个对象类别, 47, 500 118, 7449个表达式区域实体, 474, 996 表达式, 今后可以更加容易扩展。此外, 我们还将构建一个真实世界测试组, 包含100个自然图像以进一步评价 RIM 模型/表达式生成的演示文。我们首先在两个基于当前模型的模型的图像区域中定义,,, 将确定未来数据格式, 格式在特定的模型中, 格式中, 格式中, 将使用两种格式中, 格式设计中, 将提供新的格式定义,,, 将提供新的格式, 将提供新的格式, 格式, 格式, 将提供新的格式, 格式, 格式, 格式, 格式,, 将使用新的格式, 格式, 格式,,,,,,,,,, 将使用新的格式, 以以将使用新的格式, 以将将将格式定义为格式定义为格式, 格式, 格式, 格式, 格式, 格式, 格式, 格式, 格式, 格式, 格式, 格式, 格式, 格式, 格式,, 格式, 以格式化为格式化,, 格式化为格式, 格式, 格式, 格式, 格式, 格式, 格式, 格式,, 以,, 格式,,,,, 格式,

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日