学习来自自然语言监督的开放词汇语义分离模型</s> (Learning Open-vocabulary Semantic Segmentation Models From Natural Language Supervision)

In this paper, we consider the problem of open-vocabulary semantic segmentation (OVS), which aims to segment objects of arbitrary classes instead of pre-defined, closed-set categories. The main contributions are as follows: First, we propose a transformer-based model for OVS, termed as OVSegmentor, which only exploits web-crawled image-text pairs for pre-training without using any mask annotations. OVSegmentor assembles the image pixels into a set of learnable group tokens via a slot-attention based binding module, and aligns the group tokens to the corresponding caption embedding. Second, we propose two proxy tasks for training, namely masked entity completion and cross-image mask consistency. The former aims to infer all masked entities in the caption given the group tokens, that enables the model to learn fine-grained alignment between visual groups and text entities. The latter enforces consistent mask predictions between images that contain shared entities, which encourages the model to learn visual invariance. Third, we construct CC4M dataset for pre-training by filtering CC12M with frequently appeared entities, which significantly improves training efficiency. Fourth, we perform zero-shot transfer on three benchmark datasets, PASCAL VOC 2012, PASCAL Context, and COCO Object. Our model achieves superior segmentation results over the state-of-the-art method by using only 3\% data (4M vs 134M) for pre-training. Code and pre-trained models will be released for future research.

翻译：在本文中,我们考虑开放词汇语义分割(OVS)的问题,它的目的是将任意类对象与预定义的封闭型分类分开,而不用预先定义的封闭型分类。主要贡献如下:首先,我们提议一个基于变压器的OVS变压器模型,称为OVSevisionor,它只利用网络浏览图像文本配对来进行预培训,而不使用任何掩码说明。OVSegSementor将图像像素组合成一组可学习的群象(OVS),通过基于视点的绑定模块,使组符号与相应的标题嵌入相匹配。第二,我们提议两个代理培训任务,即:掩码实体完成和交叉图像图像图像图像拼模的一致性。前,将所有被遮罩的成图像配对,无需使用任何掩码的图像组合符号组合组合组合组合,将图像组合符号组合符号拼凑成一组,鼓励模型进行视觉变换。第三,我们用CC4-COM数据设置高级数据设置,我们经常在升级前,我们用C-L数据格式进行数据转换。</s>

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/