被下列条款背叛: 用于开放式词汇区划的联合地基和生成 (Betrayed by Captions: Joint Caption Grounding and Generation for Open Vocabulary Instance Segmentation) - 专知论文

会员服务 ·

0

词表 · 示例 · Performer · 掩码 · Extensibility ·

2023 年 1 月 2 日

Betrayed by Captions: Joint Caption Grounding and Generation for Open Vocabulary Instance Segmentation

翻译：被下列条款背叛: 用于开放式词汇区划的联合地基和生成

Jianzong Wu,Xiangtai Li,Henghui Ding,Xia Li,Guangliang Cheng,Yunhai Tong,Chen Change Loy

from arxiv, Technical Report

In this work, we focus on instance-level open vocabulary segmentation, intending to expand a segmenter for instance-wise novel categories without mask annotations. We investigate a simple yet effective framework with the help of image captions, focusing on exploiting thousands of object nouns in captions to discover instances of novel classes. Rather than adopting pretrained caption models or using massive caption datasets with complex pipelines, we propose an end-to-end solution from two aspects: caption grounding and caption generation. In particular, we devise a joint Caption Grounding and Generation (CGG) framework based on a Mask Transformer baseline. The framework has a novel grounding loss that performs explicit and implicit multi-modal feature alignments. We further design a lightweight caption generation head to allow for additional caption supervision. We find that grounding and generation complement each other, significantly enhancing the segmentation performance for novel categories. We conduct extensive experiments on the COCO dataset with two settings: Open Vocabulary Instance Segmentation (OVIS) and Open Set Panoptic Segmentation (OSPS). The results demonstrate the superiority of our CGG framework over previous OVIS methods, achieving a large improvement of 6.8% mAP on novel classes without extra caption data. Our method also achieves over 15% PQ improvements for novel classes on the OSPS benchmark under various settings.

翻译：在这项工作中,我们侧重于实例层面的开放词汇分解,打算扩大一个分解器,例如没有掩码说明的分解器。我们调查了一个简单而有效的框架,在图像字幕的帮助下进行简单而有效的框架,重点是在字幕中利用数千个对象名,以发现新类的事例。我们不是采用预先训练的字幕模型,也不是使用具有复杂管道的大量字幕数据集,而是从两个方面提出端对端解决方案:字幕地基和字幕生成。特别是,我们根据面具变换器基线设计了一个联合定位和生成框架(CGG) 。这个框架有一个新的基质损失,可以进行明确和隐含的多模式特征调整。我们进一步设计了一个轻量级字幕生成头,以允许额外的字幕监督。我们发现,这种基点和生成可以相互补充,大大增强新类别中的分解性性功能。我们用两个环境对COCO数据集进行了广泛的实验:公开词汇分解(OVISIS)和开源码分解(CGGG)框架(CGG)框架。结果显示我们的CGIF框架优于以前的OVIS新版本方法优于先前的优于OVIS)方法,在15-SA标准中也实现了大规模改进了我们的PSAQ。

0

相关内容

【CVPR 2022】使用多模态Transformer的端到端视频对象分割，End-to-End Referring Video Object Segmentation with Multimodal Transformer

【CVPR 2022】使用多模态Transformer的端到端视频对象分割，End-to-End Referring Video Object Segmentation with Multimodal Transformer

专知会员服务

28+阅读 · 2022年3月3日

【视频目标检测与跟踪：综述论文】Video Object Segmentation and Tracking: A Survey

专知会员服务

66+阅读 · 2020年6月4日

【北卡罗莱纳州立大学】单场景视频异常检测综述，A Survey of Single-Scene Video Anomaly Detection

【北卡罗莱纳州立大学】单场景视频异常检测综述，A Survey of Single-Scene Video Anomaly Detection

专知会员服务

31+阅读 · 2020年4月13日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

【图像分割| 2019最新综述】理解图像分割的深度学习技术，附58页PDF（Understanding Deep Learning Techniques for Image Segmentation）

【图像分割| 2019最新综述】理解图像分割的深度学习技术，附58页PDF（Understanding Deep Learning Techniques for Image Segmentation）

专知会员服务

59+阅读 · 2019年11月16日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

玉米穗行数的遗传学基础研究

国家自然科学基金

0+阅读 · 2013年12月31日

三氧化二砷降解HER2蛋白的机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

染色体3q13和5p13区域遗传变异与胃癌易感性的分子流行病学研究

国家自然科学基金

0+阅读 · 2012年12月31日

拓扑绝缘体Bi2Se3和Bi2Te3薄膜中巨Rashba效应的第一性原理研究

国家自然科学基金

0+阅读 · 2012年12月31日

Intraflagellar Transport运输纤毛蛋白的分子机理

国家自然科学基金

0+阅读 · 2012年12月31日

强各向异性Be薄膜的晶粒细化和应力弛豫机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

分子团簇负离子束沉积超薄BiSe二维拓扑绝缘体

国家自然科学基金

0+阅读 · 2012年12月31日

实时安全关键系统的建模、仿真与验证

国家自然科学基金

1+阅读 · 2012年12月31日

ECAP中fcc织构转变规律及其与微观组织演化的交互作用

国家自然科学基金

0+阅读 · 2012年12月31日

硅表面金属纳米团簇有序组装及电致迁移机制

国家自然科学基金

0+阅读 · 2008年12月31日

Monocular 3D Object Detection with Depth from Motion

Arxiv

0+阅读 · 2023年3月1日

ISBNet: a 3D Point Cloud Instance Segmentation Network with Instance-aware Sampling and Box-aware Dynamic Convolution

Arxiv

0+阅读 · 2023年3月1日

One-Shot Video Inpainting

Arxiv

0+阅读 · 2023年2月28日

GL-RG: Global-Local Representation Granularity for Video Captioning

Arxiv

1+阅读 · 2023年2月28日

COVERED, CollabOratiVE Robot Environment Dataset for 3D Semantic segmentation

Arxiv

0+阅读 · 2023年2月24日

F-VLM: Open-Vocabulary Object Detection upon Frozen Vision and Language Models

Arxiv

0+阅读 · 2023年2月23日

End-to-End Video Instance Segmentation with Transformers

Arxiv

10+阅读 · 2021年3月24日

Unifying Vision-and-Language Tasks via Text Generation

Arxiv

10+阅读 · 2021年2月4日

Detect-to-Retrieve: Efficient Regional Aggregation for Image Search

Arxiv

15+阅读 · 2018年12月4日

Zero-Shot Object Detection by Hybrid Region Embedding

Arxiv

19+阅读 · 2018年5月17日

VIP会员

文章信息

相关主题

相关VIP内容

【CVPR 2022】使用多模态Transformer的端到端视频对象分割，End-to-End Referring Video Object Segmentation with Multimodal Transformer

【CVPR 2022】使用多模态Transformer的端到端视频对象分割，End-to-End Referring Video Object Segmentation with Multimodal Transformer

专知会员服务

28+阅读 · 2022年3月3日

【视频目标检测与跟踪：综述论文】Video Object Segmentation and Tracking: A Survey

专知会员服务

66+阅读 · 2020年6月4日

【北卡罗莱纳州立大学】单场景视频异常检测综述，A Survey of Single-Scene Video Anomaly Detection

【北卡罗莱纳州立大学】单场景视频异常检测综述，A Survey of Single-Scene Video Anomaly Detection

专知会员服务

31+阅读 · 2020年4月13日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

【图像分割| 2019最新综述】理解图像分割的深度学习技术，附58页PDF（Understanding Deep Learning Techniques for Image Segmentation）

【图像分割| 2019最新综述】理解图像分割的深度学习技术，附58页PDF（Understanding Deep Learning Techniques for Image Segmentation）

专知会员服务

59+阅读 · 2019年11月16日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

【牛津博士论文】零样本强化学习综述

《美军条令：陆军指挥官与规划人员地理空间指南》60页

战术边缘指挥控制：防务面临的核心挑战

迈向开放世界检测：综述

相关资讯

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

相关论文

Monocular 3D Object Detection with Depth from Motion

Arxiv

0+阅读 · 2023年3月1日

ISBNet: a 3D Point Cloud Instance Segmentation Network with Instance-aware Sampling and Box-aware Dynamic Convolution

Arxiv

0+阅读 · 2023年3月1日

One-Shot Video Inpainting

Arxiv

0+阅读 · 2023年2月28日

GL-RG: Global-Local Representation Granularity for Video Captioning

Arxiv

1+阅读 · 2023年2月28日

COVERED, CollabOratiVE Robot Environment Dataset for 3D Semantic segmentation

Arxiv

0+阅读 · 2023年2月24日

F-VLM: Open-Vocabulary Object Detection upon Frozen Vision and Language Models

Arxiv

0+阅读 · 2023年2月23日

End-to-End Video Instance Segmentation with Transformers

Arxiv

10+阅读 · 2021年3月24日

Unifying Vision-and-Language Tasks via Text Generation

Arxiv

10+阅读 · 2021年2月4日

Detect-to-Retrieve: Efficient Regional Aggregation for Image Search

Arxiv

15+阅读 · 2018年12月4日

Zero-Shot Object Detection by Hybrid Region Embedding

Arxiv

19+阅读 · 2018年5月17日

相关基金

玉米穗行数的遗传学基础研究

国家自然科学基金

0+阅读 · 2013年12月31日

三氧化二砷降解HER2蛋白的机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

染色体3q13和5p13区域遗传变异与胃癌易感性的分子流行病学研究

国家自然科学基金

0+阅读 · 2012年12月31日

拓扑绝缘体Bi2Se3和Bi2Te3薄膜中巨Rashba效应的第一性原理研究

国家自然科学基金

0+阅读 · 2012年12月31日

Intraflagellar Transport运输纤毛蛋白的分子机理

国家自然科学基金

0+阅读 · 2012年12月31日

强各向异性Be薄膜的晶粒细化和应力弛豫机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

分子团簇负离子束沉积超薄BiSe二维拓扑绝缘体

国家自然科学基金

0+阅读 · 2012年12月31日

实时安全关键系统的建模、仿真与验证

国家自然科学基金

1+阅读 · 2012年12月31日

ECAP中fcc织构转变规律及其与微观组织演化的交互作用

国家自然科学基金

0+阅读 · 2012年12月31日

硅表面金属纳米团簇有序组装及电致迁移机制

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员