带有文本到图像传播模型的开放光学截面</s> (Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models)

We present ODISE: Open-vocabulary DIffusion-based panoptic SEgmentation, which unifies pre-trained text-image diffusion and discriminative models to perform open-vocabulary panoptic segmentation. Text-to-image diffusion models have shown the remarkable capability of generating high-quality images with diverse open-vocabulary language descriptions. This demonstrates that their internal representation space is highly correlated with open concepts in the real world. Text-image discriminative models like CLIP, on the other hand, are good at classifying images into open-vocabulary labels. We propose to leverage the frozen representation of both these models to perform panoptic segmentation of any category in the wild. Our approach outperforms the previous state of the art by significant margins on both open-vocabulary panoptic and semantic segmentation tasks. In particular, with COCO training only, our method achieves 23.4 PQ and 30.0 mIoU on the ADE20K dataset, with 8.3 PQ and 7.9 mIoU absolute improvement over the previous state-of-the-art. Project page is available at https://jerryxu.net/ODISE .

翻译：我们介绍ODISE:基于开放词汇的光学光学透视模型,它统一了预先训练的文本图像扩散和歧视性模型,以进行开放的光学截面截面截面。文字到图像扩散模型显示了生成高质量图像的非凡能力,并提供了多种开放词汇语言描述。这显示其内部代表空间与现实世界的开放概念高度相关。另一方面,像CLIP这样的文本模拟歧视模型在将图像分类为开放词汇标签方面十分出色。我们提议利用这两种模型的冷冻代表面来进行野外任何类别的光学分离。我们的方法在开放语言光学透面和语义分割任务上大大优于以往的艺术状态。特别是,仅通过COCO培训,我们的方法在ADE20K数据集上达到了23.4 PQ和30.0 mIoU,在先前的状态/Ojir-EDISM/OVILA中实现了8.3 PQ和7.9 mIOU绝对改进。</s>

相关内容

MoDELS

关注 44

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【Hugging Face】使用自定义数据集微调语义分割模型，Fine-Tune a Semantic Segmentation Model with a Custom Dataset

专知会员服务

21+阅读 · 2022年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日