Ref-NMS:双层参考表达式定位中的打破建议瓶颈 (Ref-NMS: Breaking Proposal Bottlenecks in Two-Stage Referring Expression Grounding) - 专知论文

会员服务 ·

0

评论员 · NMS · Extensibility · Obvious · Performer ·

2020 年 12 月 14 日

Ref-NMS: Breaking Proposal Bottlenecks in Two-Stage Referring Expression Grounding

翻译：Ref-NMS:双层参考表达式定位中的打破建议瓶颈

Long Chen,Wenbo Ma,Jun Xiao,Hanwang Zhang,Shih-Fu Chang

from arxiv, Appear in AAAI 2021, Codes are available at: https://github.com/ChopinSharp/ref-nms

The prevailing framework for solving referring expression grounding is based on a two-stage process: 1) detecting proposals with an object detector and 2) grounding the referent to one of the proposals. Existing two-stage solutions mostly focus on the grounding step, which aims to align the expressions with the proposals. In this paper, we argue that these methods overlook an obvious mismatch between the roles of proposals in the two stages: they generate proposals solely based on the detection confidence (i.e., expression-agnostic), hoping that the proposals contain all right instances in the expression (i.e., expression-aware). Due to this mismatch, current two-stage methods suffer from a severe performance drop between detected and ground-truth proposals. To this end, we propose Ref-NMS, which is the first method to yield expression-aware proposals at the first stage. Ref-NMS regards all nouns in the expression as critical objects, and introduces a lightweight module to predict a score for aligning each box with a critical object. These scores can guide the NMS operation to filter out the boxes irrelevant to the expression, increasing the recall of critical objects, resulting in a significantly improved grounding performance. Since Ref- NMS is agnostic to the grounding step, it can be easily integrated into any state-of-the-art two-stage method. Extensive ablation studies on several backbones, benchmarks, and tasks consistently demonstrate the superiority of Ref-NMS. Codes are available at: https://github.com/ChopinSharp/ref-nms.

翻译：现有两阶段解决方案主要侧重于基础步骤,目的是使表达与建议保持一致。在本文件中,我们争辩说,这些方法忽略了两个阶段提案作用之间的明显不匹配:它们产生的提案完全基于检测信任(即表达-不可知性),希望这些提案包含表达式中的所有正确实例(即表达-觉悟),由于这种不匹配,当前两阶段方法因检测到的和地面真相建议之间的严重性能下降而受到影响。为此,我们提议了Ref-NMS,这是在第一阶段产生表达-认知建议的第一个方法。Ref-NMS将表达中的所有名词都视为关键对象,并引入一个轻度模块,以预测每个框与关键对象(即表达-觉悟)。这些分数可以指导NMS操作过滤与表达中无关的框,增加关键对象的回顾度。

0

相关内容

评论员

【干货书-IBM推荐】机器学习傻瓜式入门，75页pdf

【干货书-IBM推荐】机器学习傻瓜式入门，75页pdf

专知会员服务

52+阅读 · 2020年9月29日

【CVPR2020-微软&FB】自监督学习的视觉语言建模，115页ppt讲述多模态预训练进展

【CVPR2020-微软&FB】自监督学习的视觉语言建模，115页ppt讲述多模态预训练进展

专知会员服务

59+阅读 · 2020年6月18日

CVPR2020 | 商汤-港中文等提出PV-RCNN：3D目标检测新网络

CVPR2020 | 商汤-港中文等提出PV-RCNN：3D目标检测新网络

专知会员服务

45+阅读 · 2020年4月17日

【CVPR2020-香港中文大学】PointGroup:用于3D实例分割的双设置点分组，PointGroup: Dual-Set Point Grouping for 3D Instance Segmentation

【CVPR2020-香港中文大学】PointGroup:用于3D实例分割的双设置点分组，PointGroup: Dual-Set Point Grouping for 3D Instance Segmentation

专知会员服务

12+阅读 · 2020年4月6日

【SIGMOD2020-CMU】在内存中搜索树的顺序保持键压缩，Order-Preserving Key Compression for In-Memory Search Trees

【SIGMOD2020-CMU】在内存中搜索树的顺序保持键压缩，Order-Preserving Key Compression for In-Memory Search Trees

专知会员服务

15+阅读 · 2020年3月7日

【2019 北京智源大会】Recent Breakthroughs in Natural Language Processing（NLP的最新突破） Christopher Manning / 斯坦福人工智能实验室（SAIL）负责人

【2019 北京智源大会】Recent Breakthroughs in Natural Language Processing（NLP的最新突破） Christopher Manning / 斯坦福人工智能实验室（SAIL）负责人

专知会员服务

10+阅读 · 2019年11月1日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【学界】CVPR 2019 论文大盘点—目标检测篇

【学界】CVPR 2019 论文大盘点—目标检测篇

GAN生成式对抗网络

9+阅读 · 2019年7月1日

CVPR 2019 论文大盘点—目标检测篇

CVPR 2019 论文大盘点—目标检测篇

极市平台

33+阅读 · 2019年7月1日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无人机视觉挑战赛 | ICCV 2019 Workshop—VisDrone2019

无人机视觉挑战赛 | ICCV 2019 Workshop—VisDrone2019

PaperWeekly

7+阅读 · 2019年5月5日

CVPR2019 | Stereo R-CNN 3D 目标检测

CVPR2019 | Stereo R-CNN 3D 目标检测

极市平台

27+阅读 · 2019年3月10日

CVPR2019 | 实例分割的进阶三级跳：从 Mask R-CNN 到 Hybrid Task Cascade

CVPR2019 | 实例分割的进阶三级跳：从 Mask R-CNN 到 Hybrid Task Cascade

极市平台

16+阅读 · 2019年3月5日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Fast R-CNN

数据挖掘入门与实战

3+阅读 · 2018年4月20日

【推荐】YOLO实时目标检测(6fps)

【推荐】YOLO实时目标检测(6fps)

机器学习研究会

20+阅读 · 2017年11月5日

Fast, Accurate Barcode Detection in Ultra High-Resolution Images

Arxiv

0+阅读 · 2021年2月13日

Back to the Future: Unsupervised Backprop-based Decoding for Counterfactual and Abductive Commonsense Reasoning

Arxiv

3+阅读 · 2020年10月20日

Multi-Modal Graph Neural Network for Joint Reasoning on Vision and Scene Text

Multi-Modal Graph Neural Network for Joint Reasoning on Vision and Scene Text

Arxiv

10+阅读 · 2020年3月31日

Exploring Categorical Regularization for Domain Adaptive Object Detection

Exploring Categorical Regularization for Domain Adaptive Object Detection

Arxiv

5+阅读 · 2020年3月20日

Dynamic Graph Attention for Referring Expression Comprehension

Dynamic Graph Attention for Referring Expression Comprehension

Arxiv

6+阅读 · 2019年9月18日

A Survey of Model Compression and Acceleration for Deep Neural Networks

Arxiv

66+阅读 · 2019年9月8日

Reverse Attention for Salient Object Detection

Arxiv

11+阅读 · 2019年4月15日

Cross-Modal Self-Attention Network for Referring Image Segmentation

Cross-Modal Self-Attention Network for Referring Image Segmentation

Arxiv

18+阅读 · 2019年4月9日

Few-shot Adaptive Faster R-CNN

Few-shot Adaptive Faster R-CNN

Arxiv

3+阅读 · 2019年3月22日

PointRCNN: 3D Object Proposal Generation and Detection from Point Cloud

Arxiv

7+阅读 · 2018年12月11日

VIP会员

文章信息

相关主题

相关VIP内容

【干货书-IBM推荐】机器学习傻瓜式入门，75页pdf

【干货书-IBM推荐】机器学习傻瓜式入门，75页pdf

专知会员服务

52+阅读 · 2020年9月29日

【CVPR2020-微软&FB】自监督学习的视觉语言建模，115页ppt讲述多模态预训练进展

【CVPR2020-微软&FB】自监督学习的视觉语言建模，115页ppt讲述多模态预训练进展

专知会员服务

59+阅读 · 2020年6月18日

CVPR2020 | 商汤-港中文等提出PV-RCNN：3D目标检测新网络

CVPR2020 | 商汤-港中文等提出PV-RCNN：3D目标检测新网络

专知会员服务

45+阅读 · 2020年4月17日

【CVPR2020-香港中文大学】PointGroup:用于3D实例分割的双设置点分组，PointGroup: Dual-Set Point Grouping for 3D Instance Segmentation

【CVPR2020-香港中文大学】PointGroup:用于3D实例分割的双设置点分组，PointGroup: Dual-Set Point Grouping for 3D Instance Segmentation

专知会员服务

12+阅读 · 2020年4月6日

【SIGMOD2020-CMU】在内存中搜索树的顺序保持键压缩，Order-Preserving Key Compression for In-Memory Search Trees

【SIGMOD2020-CMU】在内存中搜索树的顺序保持键压缩，Order-Preserving Key Compression for In-Memory Search Trees

专知会员服务

15+阅读 · 2020年3月7日

【2019 北京智源大会】Recent Breakthroughs in Natural Language Processing（NLP的最新突破） Christopher Manning / 斯坦福人工智能实验室（SAIL）负责人

【2019 北京智源大会】Recent Breakthroughs in Natural Language Processing（NLP的最新突破） Christopher Manning / 斯坦福人工智能实验室（SAIL）负责人

专知会员服务

10+阅读 · 2019年11月1日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

【新书】基于物理的模拟

流匹配在生物学与生命科学中的应用综述

高质量数据集实践指南（1.0）

ICML 2025 关于语言模型机械可解释性的教程

相关资讯

【学界】CVPR 2019 论文大盘点—目标检测篇

【学界】CVPR 2019 论文大盘点—目标检测篇

GAN生成式对抗网络

9+阅读 · 2019年7月1日

CVPR 2019 论文大盘点—目标检测篇

CVPR 2019 论文大盘点—目标检测篇

极市平台

33+阅读 · 2019年7月1日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无人机视觉挑战赛 | ICCV 2019 Workshop—VisDrone2019

无人机视觉挑战赛 | ICCV 2019 Workshop—VisDrone2019

PaperWeekly

7+阅读 · 2019年5月5日

CVPR2019 | Stereo R-CNN 3D 目标检测

CVPR2019 | Stereo R-CNN 3D 目标检测

极市平台

27+阅读 · 2019年3月10日

CVPR2019 | 实例分割的进阶三级跳：从 Mask R-CNN 到 Hybrid Task Cascade

CVPR2019 | 实例分割的进阶三级跳：从 Mask R-CNN 到 Hybrid Task Cascade

极市平台

16+阅读 · 2019年3月5日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Fast R-CNN

数据挖掘入门与实战

3+阅读 · 2018年4月20日

【推荐】YOLO实时目标检测(6fps)

【推荐】YOLO实时目标检测(6fps)

机器学习研究会

20+阅读 · 2017年11月5日

相关论文

Fast, Accurate Barcode Detection in Ultra High-Resolution Images

Arxiv

0+阅读 · 2021年2月13日

Back to the Future: Unsupervised Backprop-based Decoding for Counterfactual and Abductive Commonsense Reasoning

Arxiv

3+阅读 · 2020年10月20日

Multi-Modal Graph Neural Network for Joint Reasoning on Vision and Scene Text

Multi-Modal Graph Neural Network for Joint Reasoning on Vision and Scene Text

Arxiv

10+阅读 · 2020年3月31日

Exploring Categorical Regularization for Domain Adaptive Object Detection

Exploring Categorical Regularization for Domain Adaptive Object Detection

Arxiv

5+阅读 · 2020年3月20日

Dynamic Graph Attention for Referring Expression Comprehension

Dynamic Graph Attention for Referring Expression Comprehension

Arxiv

6+阅读 · 2019年9月18日

A Survey of Model Compression and Acceleration for Deep Neural Networks

Arxiv

66+阅读 · 2019年9月8日

Reverse Attention for Salient Object Detection

Arxiv

11+阅读 · 2019年4月15日

Cross-Modal Self-Attention Network for Referring Image Segmentation

Cross-Modal Self-Attention Network for Referring Image Segmentation

Arxiv

18+阅读 · 2019年4月9日

Few-shot Adaptive Faster R-CNN

Few-shot Adaptive Faster R-CNN

Arxiv

3+阅读 · 2019年3月22日

PointRCNN: 3D Object Proposal Generation and Detection from Point Cloud

Arxiv

7+阅读 · 2018年12月11日

微信扫码咨询专知VIP会员