使用多式变换器探测类不可知物体 (Class-agnostic Object Detection with Multi-modal Transformer) - 专知论文

会员服务 ·

0

目标检测 · Extensibility · 变换 · Vision · Attention ·

2022 年 7 月 18 日

Class-agnostic Object Detection with Multi-modal Transformer

翻译：使用多式变换器探测类不可知物体

Muhammad Maaz,Hanoona Rasheed,Salman Khan,Fahad Shahbaz Khan,Rao Muhammad Anwer,Ming-Hsuan Yang

from arxiv, ECCV 2022 accepted

What constitutes an object? This has been a long-standing question in computer vision. Towards this goal, numerous learning-free and learning-based approaches have been developed to score objectness. However, they generally do not scale well across new domains and novel objects. In this paper, we advocate that existing methods lack a top-down supervision signal governed by human-understandable semantics. For the first time in literature, we demonstrate that Multi-modal Vision Transformers (MViT) trained with aligned image-text pairs can effectively bridge this gap. Our extensive experiments across various domains and novel objects show the state-of-the-art performance of MViTs to localize generic objects in images. Based on the observation that existing MViTs do not include multi-scale feature processing and usually require longer training schedules, we develop an efficient MViT architecture using multi-scale deformable attention and late vision-language fusion. We show the significance of MViT proposals in a diverse range of applications including open-world object detection, salient and camouflage object detection, supervised and self-supervised detection tasks. Further, MViTs can adaptively generate proposals given a specific language query and thus offer enhanced interactability. Code: \url{https://git.io/J1HPY}.

翻译：是什么构成对象? 这是计算机视觉中长期存在的一个问题。为了实现这一目标,我们开发了许多无学习和以学习为基础的方法,以达到目标。但是,它们一般没有在新的领域和新对象中进行适当的规模。在本文件中,我们主张现有方法缺乏由人类无法理解的语义学管理的自上而下监督信号。我们在文献中首次展示了由图像-文字对齐的多模式视野变异器(MViT)培训能够有效地弥合这一差距。我们在各个领域和新对象的广泛实验显示了MViTs在图像中将通用对象本地化方面的最先进的表现。基于以下观察,即现有的MViTs并不包含多尺度的特性处理,通常需要更长的培训时间表。我们开发了一个高效的MViT结构,使用多尺度的畸变注意力和迟缓的视觉语言融合。我们展示了MViT建议在各种应用中的重要性,包括开放-世界天体探测、突出的和迷彩天体探测、监督和自我监控的探测任务。此外,MViT{ViGils可以提供更高级的版本。

0

相关内容

目标检测

目标检测，也叫目标提取，是一种与计算机视觉和图像处理有关的计算机技术，用于检测数字图像和视频中特定类别的语义对象（例如人，建筑物或汽车）的实例。深入研究的对象检测领域包括面部检测和行人检测。对象检测在计算机视觉的许多领域都有应用，包括图像检索和视频监视。

知识荟萃

精品入门和进阶教程、论文和代码整理等

更多

查看相关VIP内容、论文、资讯等

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

抢鲜看！13篇CVPR2020论文链接/开源代码/解读

抢鲜看！13篇CVPR2020论文链接/开源代码/解读

专知会员服务

50+阅读 · 2020年2月26日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文】图上的表示学习综述

【论文】图上的表示学习综述

机器学习研究会

15+阅读 · 2017年9月24日

奶牛乳腺脂类合成代谢转录调控机制与基因网络构建

国家自然科学基金

0+阅读 · 2014年12月31日

大规模潮流能发电场多尺度水动力特性及耦合数值模拟研究

国家自然科学基金

0+阅读 · 2014年12月31日

甘肃河西走廊盐碱土壤中放线菌生态分布及物种多样性研究

国家自然科学基金

0+阅读 · 2012年12月31日

QBO影响和调制东亚冬季风的机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

H9N2亚型禽流感病毒NS1蛋白致鸡输卵管上皮细胞病变作用的分子机制

国家自然科学基金

0+阅读 · 2012年12月31日

具有AIE特性的吡咯并吡咯二酮近红外共轭聚电解质的合成及其在肿瘤细胞靶向荧光成像中的应用研究

国家自然科学基金

0+阅读 · 2012年12月31日

带有随机参数输入的非线性双曲型方程的数值方法

国家自然科学基金

0+阅读 · 2012年12月31日

P450亚型酶对土壤典型污染物毒性响应及致毒作用机理

国家自然科学基金

0+阅读 · 2009年12月31日

乙型肝炎病毒变异激活人纤维介素基因的转录调控机制

国家自然科学基金

0+阅读 · 2009年12月31日

几类随机泛函微分方程数值方法的收敛性、稳定性和散逸性

国家自然科学基金

0+阅读 · 2008年12月31日

PSAQ-ViT V2: Towards Accurate and General Data-Free Quantization for Vision Transformers

Arxiv

0+阅读 · 2022年9月13日

CenterFormer: Center-based Transformer for 3D Object Detection

Arxiv

0+阅读 · 2022年9月12日

Deep Convolutional Pooling Transformer for Deepfake Detection

Arxiv

0+阅读 · 2022年9月12日

OmDet: Language-Aware Object Detection with Large-scale Vision-Language Multi-dataset Pre-training

Arxiv

0+阅读 · 2022年9月10日

EDTER: Edge Detection with Transformer

Arxiv

11+阅读 · 2022年3月16日

Unifying Vision-and-Language Tasks via Text Generation

Arxiv

10+阅读 · 2021年2月4日

UP-DETR: Unsupervised Pre-training for Object Detection with Transformers

UP-DETR: Unsupervised Pre-training for Object Detection with Transformers

Arxiv

19+阅读 · 2020年11月18日

Few-shot acoustic event detection via meta-learning

Arxiv

26+阅读 · 2020年2月21日

Deep Learning for Generic Object Detection: A Survey

Deep Learning for Generic Object Detection: A Survey

Arxiv

14+阅读 · 2018年9月6日

DOTA: A Large-scale Dataset for Object Detection in Aerial Images

Arxiv

19+阅读 · 2018年1月27日

VIP会员

文章信息

相关主题

相关VIP内容

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

抢鲜看！13篇CVPR2020论文链接/开源代码/解读

抢鲜看！13篇CVPR2020论文链接/开源代码/解读

专知会员服务

50+阅读 · 2020年2月26日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《乌克兰无人机产业：志愿者与政策在构建新兴无人机产业中的协同作用》最新报告

《人工智能辅助决策中的数据可视化：系统性综述》

人工智能驱动弹药制造现代化：美国陆军转型之路

《敏捷作战部署中枢纽-辐条基地选址优化研究》80页

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文】图上的表示学习综述

【论文】图上的表示学习综述

机器学习研究会

15+阅读 · 2017年9月24日

相关论文

PSAQ-ViT V2: Towards Accurate and General Data-Free Quantization for Vision Transformers

Arxiv

0+阅读 · 2022年9月13日

CenterFormer: Center-based Transformer for 3D Object Detection

Arxiv

0+阅读 · 2022年9月12日

Deep Convolutional Pooling Transformer for Deepfake Detection

Arxiv

0+阅读 · 2022年9月12日

OmDet: Language-Aware Object Detection with Large-scale Vision-Language Multi-dataset Pre-training

Arxiv

0+阅读 · 2022年9月10日

EDTER: Edge Detection with Transformer

Arxiv

11+阅读 · 2022年3月16日

Unifying Vision-and-Language Tasks via Text Generation

Arxiv

10+阅读 · 2021年2月4日

UP-DETR: Unsupervised Pre-training for Object Detection with Transformers

UP-DETR: Unsupervised Pre-training for Object Detection with Transformers

Arxiv

19+阅读 · 2020年11月18日

Few-shot acoustic event detection via meta-learning

Arxiv

26+阅读 · 2020年2月21日

Deep Learning for Generic Object Detection: A Survey

Deep Learning for Generic Object Detection: A Survey

Arxiv

14+阅读 · 2018年9月6日

DOTA: A Large-scale Dataset for Object Detection in Aerial Images

Arxiv

19+阅读 · 2018年1月27日

相关基金

奶牛乳腺脂类合成代谢转录调控机制与基因网络构建

国家自然科学基金

0+阅读 · 2014年12月31日

大规模潮流能发电场多尺度水动力特性及耦合数值模拟研究

国家自然科学基金

0+阅读 · 2014年12月31日

甘肃河西走廊盐碱土壤中放线菌生态分布及物种多样性研究

国家自然科学基金

0+阅读 · 2012年12月31日

QBO影响和调制东亚冬季风的机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

H9N2亚型禽流感病毒NS1蛋白致鸡输卵管上皮细胞病变作用的分子机制

国家自然科学基金

0+阅读 · 2012年12月31日

具有AIE特性的吡咯并吡咯二酮近红外共轭聚电解质的合成及其在肿瘤细胞靶向荧光成像中的应用研究

国家自然科学基金

0+阅读 · 2012年12月31日

带有随机参数输入的非线性双曲型方程的数值方法

国家自然科学基金

0+阅读 · 2012年12月31日

P450亚型酶对土壤典型污染物毒性响应及致毒作用机理

国家自然科学基金

0+阅读 · 2009年12月31日

乙型肝炎病毒变异激活人纤维介素基因的转录调控机制

国家自然科学基金

0+阅读 · 2009年12月31日

几类随机泛函微分方程数值方法的收敛性、稳定性和散逸性

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员