区域CLIP: 区域语言图像培训预科培训 (RegionCLIP: Region-based Language-Image Pretraining)

Contrastive language-image pretraining (CLIP) using image-text pairs has achieved impressive results on image classification in both zero-shot and transfer learning settings. However, we show that directly applying such models to recognize image regions for object detection leads to poor performance due to a domain shift: CLIP was trained to match an image as a whole to a text description, without capturing the fine-grained alignment between image regions and text spans. To mitigate this issue, we propose a new method called RegionCLIP that significantly extends CLIP to learn region-level visual representations, thus enabling fine-grained alignment between image regions and textual concepts. Our method leverages a CLIP model to match image regions with template captions and then pretrains our model to align these region-text pairs in the feature space. When transferring our pretrained model to the open-vocabulary object detection tasks, our method significantly outperforms the state of the art by 3.8 AP50 and 2.2 AP for novel categories on COCO and LVIS datasets, respectively. Moreoever, the learned region representations support zero-shot inference for object detection, showing promising results on both COCO and LVIS datasets. Our code is available at https://github.com/microsoft/RegionCLIP.

翻译：使用图像-文本配对进行图像-图像培训前的对比语言图像(CLIP)在零光和传输学习设置中图像分类取得了令人印象深刻的成果。然而,我们表明,直接应用这些模型来识别目标检测图像区域,由于域变换而导致性能不佳:CLIP受过培训,将图像作为一个整体与文本描述相匹配,而没有记录图像区域和文本范围之间的细微对比。为缓解这一问题,我们提议了一种名为区域CLIP的新方法,该方法大大扩展了CLIP,以学习区域级的视觉显示,从而使得图像区域和文本概念之间能够进行细微的调整。我们的方法利用了CLIP模型将图像区域与模板说明相匹配,然后将模型与功能空间中的这些区域文本配对进行预先调整。在将我们预先培训的模型转换到开放式语言区域与文本检测任务时,我们的方法大大超越了CCO和LVIS数据集的新分类的艺术状态,即3.8 AP50和2.2 AP。 Moreoverever,我们学习的区域演示区域演示支持图像区域图象区/CLSGV 数据显示有希望的CO/CO 。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/