【CVPR 2022】基于视觉-语言验证和迭代推理的视觉定位,Open-Vocabulary One-Stage Detection with Hierarchical Visual-Language Knowledge Distillation - 专知VIP

会员服务 ·

0

CVPR 2022 · 视觉-语言验证 · 迭代推理 · 视觉定位 · 跨模态 ·

2022 年 3 月 19 日

【CVPR 2022】基于视觉-语言验证和迭代推理的视觉定位,Open-Vocabulary One-Stage Detection with Hierarchical Visual-Language Knowledge Distillation

专知会员服务

专知，提供专业可信的知识分发服务，让认知协作更快更好！

近年来，从跨模态模型中进行知识蒸馏使得开放词汇检测任务取得了快速进展。然而，我们发现用单阶段检测器进行知识蒸馏所达到的效果远不如双阶段检测器，我们分析了产生这种差异的原因是双阶段方法中类别无关的物体候选覆盖了未见类别，使得它在蒸馏时能学到未见类别的语义信息，而单阶段方法中所定义的正样本只包含已知类别，缺失了对新类别的学习。

为了弥补单阶段方法因缺少类别无关物体候选的这种固有缺陷，我们提出了一种对未见类别物体进行隐式学习的弱监督方法。该方法通过caption与特征图之间的跨模态注意力机制来进行语言到视觉的全局级知识蒸馏。凭借以上方案，我们显著超过了过去最好的开放词汇单阶段检测器。

作者：Li Yang, Yan Xu, Chunfeng Yuan*, Wei Liu, Bing Li, Weiming Hu

成为VIP会员查看完整内容

12

相关内容

CVPR 2022

CVPR 2022 将于2022年 6 月 21-24 日在美国的新奥尔良举行。CVPR是IEEE Conference on Computer Vision and Pattern Recognition的缩写，即IEEE国际计算机视觉与模式识别会议。该会议是由IEEE举办的计算机视觉和模式识别领域的顶级会议，会议的主要内容是计算机视觉与模式识别技术。

知识荟萃

精品入门和进阶教程、论文和代码整理等

更多

查看相关VIP内容、论文、资讯等

【CVPR 2022】基于实例深度估计的统一深度感知全景分割 PanopticDepth: Per-Instance Depth Estimation for Unified Depth-Aware Panoptic Segmentation

【CVPR 2022】基于实例深度估计的统一深度感知全景分割 PanopticDepth: Per-Instance Depth Estimation for Unified Depth-Aware Panoptic Segmentation

专知会员服务

18+阅读 · 2022年3月19日

【CVPR 2022】基于时空解耦与重耦的RGB-D动作识别 Decoupling and Recoupling Spatiotemporal Representation for RGB-D-based Motion Recognition

【CVPR 2022】基于时空解耦与重耦的RGB-D动作识别 Decoupling and Recoupling Spatiotemporal Representation for RGB-D-based Motion Recognition

专知会员服务

14+阅读 · 2022年3月19日

【CVPR 2022】基于层次化视觉语言知识蒸馏的开放词汇单阶段检测，Improving Visual Grounding with Visual-Linguistic Verification and Iterative Reasoning

【CVPR 2022】基于层次化视觉语言知识蒸馏的开放词汇单阶段检测，Improving Visual Grounding with Visual-Linguistic Verification and Iterative Reasoning

专知会员服务

7+阅读 · 2022年3月19日

【CVPR 2022】视觉提示调整（VPT），Vision Prompt Tuning

【CVPR 2022】视觉提示调整（VPT），Vision Prompt Tuning

专知会员服务

32+阅读 · 2022年3月12日

【CVPR 2022】基于元内存传输的跨域少镜头语义分割，Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer

【CVPR 2022】基于元内存传输的跨域少镜头语义分割，Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer

专知会员服务

13+阅读 · 2022年3月12日

【CVPR2021】一种基于知识蒸馏的弱监督图像文本匹配模型

专知会员服务

35+阅读 · 2021年4月8日

【AAAI2021】知识增强的视觉-语言预训练技术 ERNIE-ViL

【AAAI2021】知识增强的视觉-语言预训练技术 ERNIE-ViL

专知会员服务

26+阅读 · 2021年1月29日

【CVPR2020】实例感知、上下文聚焦和内存有效的弱监督目标检测，Instance-aware, Context-focused, and Memory-efficient Weakly Supervised Object Detection

【CVPR2020】实例感知、上下文聚焦和内存有效的弱监督目标检测，Instance-aware, Context-focused, and Memory-efficient Weakly Supervised Object Detection

专知会员服务

34+阅读 · 2020年4月11日

【论文推荐】多模态知识图谱上的端到端实体分类，End-to-End Entity Classification on Multimodal Knowledge Graphs

【论文推荐】多模态知识图谱上的端到端实体分类，End-to-End Entity Classification on Multimodal Knowledge Graphs

专知会员服务

50+阅读 · 2020年3月30日

【北京大学】探索提取跨模态信息进行图像caption，Exploring and Distilling Cross-Modal Information for Image Captioning

【北京大学】探索提取跨模态信息进行图像caption，Exploring and Distilling Cross-Modal Information for Image Captioning

专知会员服务

54+阅读 · 2020年3月3日

CVPR 2022 | 南大提出：Structured Sparse R-CNN：单阶段端到端场景图生成器

CVPR 2022 | 南大提出：Structured Sparse R-CNN：单阶段端到端场景图生成器

CVer

0+阅读 · 2022年4月13日

基于自回归填空的通用语言模型预训练 | 论文荐读

基于自回归填空的通用语言模型预训练 | 论文荐读

学术头条

5+阅读 · 2022年3月14日

学习视觉和语言的多粒度对齐？字节提出新多模态预训练方法 X-VLM：代码已开源！

学习视觉和语言的多粒度对齐？字节提出新多模态预训练方法 X-VLM：代码已开源！

PaperWeekly

0+阅读 · 2022年2月27日

【论文导读】2021年论文导读第二十一期，7篇「视觉语言表示、图嵌入」TIP等论文

【论文导读】2021年论文导读第二十一期，7篇「视觉语言表示、图嵌入」TIP等论文

专知

1+阅读 · 2021年11月16日

VALSE 论文速览第23期：VS-Net:基于分割投票的视觉定位

VALSE 论文速览第23期：VS-Net:基于分割投票的视觉定位

VALSE

1+阅读 · 2021年11月2日

VALSE 论文速览第19期：基于Transformer的视觉目标跟踪

VALSE 论文速览第19期：基于Transformer的视觉目标跟踪

VALSE

0+阅读 · 2021年10月21日

论文浅尝 | Global Relation Embedding for Relation Extraction

论文浅尝 | Global Relation Embedding for Relation Extraction

开放知识图谱

12+阅读 · 2019年3月3日

Single-Shot Object Detection with Enriched Semantics

Single-Shot Object Detection with Enriched Semantics

统计学习与视觉计算组

14+阅读 · 2018年8月29日

论文笔记之Feature Selective Networks for Object Detection

论文笔记之Feature Selective Networks for Object Detection

统计学习与视觉计算组

21+阅读 · 2018年7月26日

Focal Loss for Dense Object Detection

Focal Loss for Dense Object Detection

统计学习与视觉计算组

12+阅读 · 2018年3月15日

基于视觉上下文与文字显著性的复杂自然场景中文字检测研究

国家自然科学基金

1+阅读 · 2015年12月31日

基于颤振预报的螺杆转子智能自抑振加工理论与方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于词向量表示的大规模知识图谱构建方法研究

国家自然科学基金

8+阅读 · 2014年12月31日

基于血管关键点和旋转不变自相似特征的多模态眼底图像稳健配准研究

国家自然科学基金

2+阅读 · 2013年12月31日

动态复杂未知环境下的移动机器人实时SLAM算法研究

国家自然科学基金

2+阅读 · 2013年12月31日

基于Wiki资源的中英文跨语言本体知识库构建

国家自然科学基金

1+阅读 · 2012年12月31日

语义网络环境下数字图书馆资源多维度聚合与可视化研究

国家自然科学基金

0+阅读 · 2012年12月31日

大规模数据的个性化分类学习

国家自然科学基金

1+阅读 · 2012年12月31日

基于激光与机器视觉技术的立木胸径检测方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于线裁剪的数字视觉信息处理技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

Detect-and-describe: Joint learning framework for detection and description of objects

Arxiv

0+阅读 · 2022年4月19日

Dense Learning based Semi-Supervised Object Detection

Arxiv

9+阅读 · 2022年4月15日

Incremental Prompting: Episodic Memory Prompt for Lifelong Event Detection

Arxiv

0+阅读 · 2022年4月15日

The Art of Prompting: Event Detection based on Type Specific Prompts

Arxiv

0+阅读 · 2022年4月14日

A Survey on Contextual Embeddings

Arxiv

29+阅读 · 2020年3月16日

UniViLM: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation

UniViLM: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation

Arxiv

19+阅读 · 2020年2月15日

Improving Knowledge-aware Dialogue Generation via Knowledge Base Question Answering

Arxiv

16+阅读 · 2019年12月16日

Pre-Training with Whole Word Masking for Chinese BERT

Arxiv

11+阅读 · 2019年6月19日

BERT for Joint Intent Classification and Slot Filling

Arxiv

12+阅读 · 2019年2月28日

Deep Anomaly Detection with Outlier Exposure

Deep Anomaly Detection with Outlier Exposure

Arxiv

17+阅读 · 2018年12月21日

VIP会员

相关主题

视觉-语言验证

相关VIP内容

【CVPR 2022】基于实例深度估计的统一深度感知全景分割 PanopticDepth: Per-Instance Depth Estimation for Unified Depth-Aware Panoptic Segmentation

【CVPR 2022】基于实例深度估计的统一深度感知全景分割 PanopticDepth: Per-Instance Depth Estimation for Unified Depth-Aware Panoptic Segmentation

专知会员服务

18+阅读 · 2022年3月19日

【CVPR 2022】基于时空解耦与重耦的RGB-D动作识别 Decoupling and Recoupling Spatiotemporal Representation for RGB-D-based Motion Recognition

【CVPR 2022】基于时空解耦与重耦的RGB-D动作识别 Decoupling and Recoupling Spatiotemporal Representation for RGB-D-based Motion Recognition

专知会员服务

14+阅读 · 2022年3月19日

【CVPR 2022】基于层次化视觉语言知识蒸馏的开放词汇单阶段检测，Improving Visual Grounding with Visual-Linguistic Verification and Iterative Reasoning

【CVPR 2022】基于层次化视觉语言知识蒸馏的开放词汇单阶段检测，Improving Visual Grounding with Visual-Linguistic Verification and Iterative Reasoning

专知会员服务

7+阅读 · 2022年3月19日

【CVPR 2022】视觉提示调整（VPT），Vision Prompt Tuning

【CVPR 2022】视觉提示调整（VPT），Vision Prompt Tuning

专知会员服务

32+阅读 · 2022年3月12日

【CVPR 2022】基于元内存传输的跨域少镜头语义分割，Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer

【CVPR 2022】基于元内存传输的跨域少镜头语义分割，Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer

专知会员服务

13+阅读 · 2022年3月12日

【CVPR2021】一种基于知识蒸馏的弱监督图像文本匹配模型

专知会员服务

35+阅读 · 2021年4月8日

【AAAI2021】知识增强的视觉-语言预训练技术 ERNIE-ViL

【AAAI2021】知识增强的视觉-语言预训练技术 ERNIE-ViL

专知会员服务

26+阅读 · 2021年1月29日

【CVPR2020】实例感知、上下文聚焦和内存有效的弱监督目标检测，Instance-aware, Context-focused, and Memory-efficient Weakly Supervised Object Detection

【CVPR2020】实例感知、上下文聚焦和内存有效的弱监督目标检测，Instance-aware, Context-focused, and Memory-efficient Weakly Supervised Object Detection

专知会员服务

34+阅读 · 2020年4月11日

【论文推荐】多模态知识图谱上的端到端实体分类，End-to-End Entity Classification on Multimodal Knowledge Graphs

【论文推荐】多模态知识图谱上的端到端实体分类，End-to-End Entity Classification on Multimodal Knowledge Graphs

专知会员服务

50+阅读 · 2020年3月30日

【北京大学】探索提取跨模态信息进行图像caption，Exploring and Distilling Cross-Modal Information for Image Captioning

【北京大学】探索提取跨模态信息进行图像caption，Exploring and Distilling Cross-Modal Information for Image Captioning

专知会员服务

54+阅读 · 2020年3月3日

热门VIP内容

开通专知VIP会员享更多权益服务

《战区安全决策课程体系》最新244页

《"无人机航母"原型平台》

任务规划与地形分析：现代复杂环境作战导航体系

《攻击场景描述形式化模型研究》

相关资讯

CVPR 2022 | 南大提出：Structured Sparse R-CNN：单阶段端到端场景图生成器

CVPR 2022 | 南大提出：Structured Sparse R-CNN：单阶段端到端场景图生成器

CVer

0+阅读 · 2022年4月13日

基于自回归填空的通用语言模型预训练 | 论文荐读

基于自回归填空的通用语言模型预训练 | 论文荐读

学术头条

5+阅读 · 2022年3月14日

学习视觉和语言的多粒度对齐？字节提出新多模态预训练方法 X-VLM：代码已开源！

学习视觉和语言的多粒度对齐？字节提出新多模态预训练方法 X-VLM：代码已开源！

PaperWeekly

0+阅读 · 2022年2月27日

【论文导读】2021年论文导读第二十一期，7篇「视觉语言表示、图嵌入」TIP等论文

【论文导读】2021年论文导读第二十一期，7篇「视觉语言表示、图嵌入」TIP等论文

专知

1+阅读 · 2021年11月16日

VALSE 论文速览第23期：VS-Net:基于分割投票的视觉定位

VALSE 论文速览第23期：VS-Net:基于分割投票的视觉定位

VALSE

1+阅读 · 2021年11月2日

VALSE 论文速览第19期：基于Transformer的视觉目标跟踪

VALSE 论文速览第19期：基于Transformer的视觉目标跟踪

VALSE

0+阅读 · 2021年10月21日

论文浅尝 | Global Relation Embedding for Relation Extraction

论文浅尝 | Global Relation Embedding for Relation Extraction

开放知识图谱

12+阅读 · 2019年3月3日

Single-Shot Object Detection with Enriched Semantics

Single-Shot Object Detection with Enriched Semantics

统计学习与视觉计算组

14+阅读 · 2018年8月29日

论文笔记之Feature Selective Networks for Object Detection

论文笔记之Feature Selective Networks for Object Detection

统计学习与视觉计算组

21+阅读 · 2018年7月26日

Focal Loss for Dense Object Detection

Focal Loss for Dense Object Detection

统计学习与视觉计算组

12+阅读 · 2018年3月15日

相关基金

基于视觉上下文与文字显著性的复杂自然场景中文字检测研究

国家自然科学基金

1+阅读 · 2015年12月31日

基于颤振预报的螺杆转子智能自抑振加工理论与方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于词向量表示的大规模知识图谱构建方法研究

国家自然科学基金

8+阅读 · 2014年12月31日

基于血管关键点和旋转不变自相似特征的多模态眼底图像稳健配准研究

国家自然科学基金

2+阅读 · 2013年12月31日

动态复杂未知环境下的移动机器人实时SLAM算法研究

国家自然科学基金

2+阅读 · 2013年12月31日

基于Wiki资源的中英文跨语言本体知识库构建

国家自然科学基金

1+阅读 · 2012年12月31日

语义网络环境下数字图书馆资源多维度聚合与可视化研究

国家自然科学基金

0+阅读 · 2012年12月31日

大规模数据的个性化分类学习

国家自然科学基金

1+阅读 · 2012年12月31日

基于激光与机器视觉技术的立木胸径检测方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于线裁剪的数字视觉信息处理技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

相关论文

Detect-and-describe: Joint learning framework for detection and description of objects

Arxiv

0+阅读 · 2022年4月19日

Dense Learning based Semi-Supervised Object Detection

Arxiv

9+阅读 · 2022年4月15日

Incremental Prompting: Episodic Memory Prompt for Lifelong Event Detection

Arxiv

0+阅读 · 2022年4月15日

The Art of Prompting: Event Detection based on Type Specific Prompts

Arxiv

0+阅读 · 2022年4月14日

A Survey on Contextual Embeddings

Arxiv

29+阅读 · 2020年3月16日

UniViLM: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation

UniViLM: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation

Arxiv

19+阅读 · 2020年2月15日

Improving Knowledge-aware Dialogue Generation via Knowledge Base Question Answering

Arxiv

16+阅读 · 2019年12月16日

Pre-Training with Whole Word Masking for Chinese BERT

Arxiv

11+阅读 · 2019年6月19日

BERT for Joint Intent Classification and Slot Filling

Arxiv

12+阅读 · 2019年2月28日

Deep Anomaly Detection with Outlier Exposure

Deep Anomaly Detection with Outlier Exposure

Arxiv

17+阅读 · 2018年12月21日

微信扫码咨询专知VIP会员