We introduce a new image segmentation task, called Entity Segmentation (ES), which aims to segment all visual entities (objects and stuffs) in an image without predicting their semantic labels. By removing the need of class label prediction, the models trained for such task can focus more on improving segmentation quality. It has many practical applications such as image manipulation and editing where the quality of segmentation masks is crucial but class labels are less important. We conduct the first-ever study to investigate the feasibility of convolutional center-based representation to segment things and stuffs in a unified manner, and show that such representation fits exceptionally well in the context of ES. More specifically, we propose a CondInst-like fully-convolutional architecture with two novel modules specifically designed to exploit the class-agnostic and non-overlapping requirements of ES. Experiments show that the models designed and trained for ES significantly outperforms popular class-specific panoptic segmentation models in terms of segmentation quality. Moreover, an ES model can be easily trained on a combination of multiple datasets without the need to resolve label conflicts in dataset merging, and the model trained for ES on one or more datasets can generalize very well to other test datasets of unseen domains. The code has been released at https://github.com/dvlab-research/Entity/.
翻译:我们引入了一个新的图像分割任务,名为“实体分割部分”,其目的是在不预测其语义标签的情况下在图像中将所有视觉实体(物体和物品)进行分解,而不必预测其语义标签。通过消除类标签预测的需要,为这种任务培训的模型可以更加注重改善分解质量。它有许多实际应用,例如图像操纵和编辑,其中分解面面面罩的质量至关重要,但类类标签则不太重要。我们进行了有史以来第一次研究,以研究以统一的方式对分解内容和物品进行横向中央代表的可行性,并表明这种代表方式在ES的背景下非常适合。更具体地说,我们建议采用一个类似类标签的全演化结构,其中有两个新的模块专门设计来利用ES类的分解和非重叠要求。实验表明,为ES设计和培训的模型在分解质量方面大大优于流行的类分解模式。此外,可以很容易地对多个数据集组合进行培训,而无需解决数据分类集中的标签冲突。我们提议了一个类似全变结构的结构,有两个新的模块专门设计来利用ES级的分类和不重叠的模型。在IMFESC/Set/Serves ASy AS