转焦:零弹射转移的检索和共同部分 (ReCo: Retrieve and Co-segment for Zero-shot Transfer)

Semantic segmentation has a broad range of applications, but its real-world impact has been significantly limited by the prohibitive annotation costs necessary to enable deployment. Segmentation methods that forgo supervision can side-step these costs, but exhibit the inconvenient requirement to provide labelled examples from the target distribution to assign concept names to predictions. An alternative line of work in language-image pre-training has recently demonstrated the potential to produce models that can both assign names across large vocabularies of concepts and enable zero-shot transfer for classification, but do not demonstrate commensurate segmentation abilities. In this work, we strive to achieve a synthesis of these two approaches that combines their strengths. We leverage the retrieval abilities of one such language-image pre-trained model, CLIP, to dynamically curate training sets from unlabelled images for arbitrary collections of concept names, and leverage the robust correspondences offered by modern image representations to co-segment entities among the resulting collections. The synthetic segment collections are then employed to construct a segmentation model (without requiring pixel labels) whose knowledge of concepts is inherited from the scalable pre-training process of CLIP. We demonstrate that our approach, termed Retrieve and Co-segment (ReCo) performs favourably to unsupervised segmentation approaches while inheriting the convenience of nameable predictions and zero-shot transfer. We also demonstrate ReCo's ability to generate specialist segmenters for extremely rare objects.

翻译：语义分解具有广泛的应用范围,但其真实世界影响因部署所需的令人望而望而却步的注解成本而大受限制。选择监督的分解方法可以绕过这些费用,但展示了提供目标分布中贴标签的例子以将概念名称分配给预测的不便要求。语言图像培训前前的替代工作最近表明,有可能产生既可在大型概念词汇中指定名称,又可进行零发分解以进行分类的模型,但并不显示相应的分解能力。在这项工作中,我们力求将这两种方法综合起来,将两者的优势结合起来。我们利用一种这类语言图像预培训模型(CLIP)的检索能力,从任意收集概念名称的未贴标签图像中动态整理培训组,并利用现代图像表向由此收集的组合实体提供的可靠通信。然后,合成部分集被用来构建一个分解模型(无需平分解标签),其概念知识从可缩缩缩缩前和精度递增后,我们也可以进行CLIP。我们演示了CLIP的CO级前再演化流程,我们还演示了无法复制的分解的分解方法。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/