VL-NMS:双层视觉-语言比对中突破性建议瓶颈 (VL-NMS: Breaking Proposal Bottlenecks in Two-Stage Visual-Language Matching) - 专知论文

会员服务 ·

0

评论员 · 多峰值 · Performance · Processing（编程语言） · 示例 ·

2023 年 1 月 5 日

VL-NMS: Breaking Proposal Bottlenecks in Two-Stage Visual-Language Matching

翻译：VL-NMS:双层视觉-语言比对中突破性建议瓶颈

Chenchi Zhang,Wenbo Ma,Jun Xiao,Hanwang Zhang,Jian Shao,Yueting Zhuang,Long Chen

from arxiv, arXiv admin note: substantial text overlap with arXiv:2009.01449

The prevailing framework for matching multimodal inputs is based on a two-stage process: 1) detecting proposals with an object detector and 2) matching text queries with proposals. Existing two-stage solutions mostly focus on the matching step. In this paper, we argue that these methods overlook an obvious \emph{mismatch} between the roles of proposals in the two stages: they generate proposals solely based on the detection confidence (i.e., query-agnostic), hoping that the proposals contain all instances mentioned in the text query (i.e., query-aware). Due to this mismatch, chances are that proposals relevant to the text query are suppressed during the filtering process, which in turn bounds the matching performance. To this end, we propose VL-NMS, which is the first method to yield query-aware proposals at the first stage. VL-NMS regards all mentioned instances as critical objects, and introduces a lightweight module to predict a score for aligning each proposal with a critical object. These scores can guide the NMS operation to filter out proposals irrelevant to the text query, increasing the recall of critical objects, resulting in a significantly improved matching performance. Since VL-NMS is agnostic to the matching step, it can be easily integrated into any state-of-the-art two-stage matching methods. We validate the effectiveness of VL-NMS on two multimodal matching tasks, namely referring expression grounding and image-text matching. Extensive ablation studies on several baselines and benchmarks consistently demonstrate the superiority of VL-NMS.

翻译：对多式联运投入进行匹配的主导框架基于一个两个阶段的过程:1)用对象探测器检测建议,2)用建议匹配文本查询。现有的两阶段解决办法主要侧重于匹配步骤。在本文件中,我们争辩说,这些方法忽略了两个阶段建议作用之间的明显差别:它们产生建议完全基于检测信任(即查询-不可知性),希望建议包含文本查询(即查询-认知)中提到的所有情况。由于这种不匹配,有可能在过滤过程中压制与文本查询有关的建议,这反过来又会约束匹配的性能。为此,我们提议VL-NMS,这是在第一阶段提出查询-认知建议的第一个方法。VL-NMS将所有提到的情况都视为关键对象,并引入一个轻量度模块,以预测每个建议与关键对象(即查询-查询-认知)相匹配的得分。这些分可以指导NMS业务过滤与文本查询无关的提案,增加关键对象的回音频值,从而在显著改进的性能匹配性能。自VL阶段的匹配方法可以持续地显示VNMS的两步。

0

相关内容

评论员

【CVPR 2022】基于Tracklet查询和建议的高效视频实例分割，Efficient Video Instance Segmentation via Tracklet Query and Proposal

【CVPR 2022】基于Tracklet查询和建议的高效视频实例分割，Efficient Video Instance Segmentation via Tracklet Query and Proposal

专知会员服务

16+阅读 · 2022年3月3日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

机器学习研究会

20+阅读 · 2017年12月17日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

【推荐】YOLO实时目标检测(6fps)

【推荐】YOLO实时目标检测(6fps)

机器学习研究会

20+阅读 · 2017年11月5日

【推荐】深度学习目标检测概览

【推荐】深度学习目标检测概览

机器学习研究会

10+阅读 · 2017年9月1日

g-C3N4/Au-SnO2纳米簇复合材料的构建及可见光下光电协同催化还原CO2性能研究

国家自然科学基金

0+阅读 · 2015年12月31日

PSMA通过TRAF6和TTC3调控前列腺癌细胞自噬在CRPC产生过程中的机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

Anderson型多酸的不对称修饰及可控组装研究

国家自然科学基金

1+阅读 · 2014年12月31日

近红外p-型染料敏化剂的合成及其光解水制氢性能研究

国家自然科学基金

0+阅读 · 2014年12月31日

ROS和MAPK信号通路在镍致睾酮合成障碍中的调控机理研究

国家自然科学基金

0+阅读 · 2013年12月31日

Notch信号通路参与家蚕胚胎发育分子机制的研究

国家自然科学基金

0+阅读 · 2013年12月31日

p38MAPK信号通路在硒诱导白血病细胞自噬与凋亡中的调控作用

国家自然科学基金

0+阅读 · 2011年12月31日

铜基硫属半导体纳米材料的液相可控合成与性能研究

国家自然科学基金

0+阅读 · 2009年12月31日

自噬/溶酶体凋亡途径在胆酸诱导新生大鼠肺泡Ⅱ#22411;上皮细胞凋亡中的作用

国家自然科学基金

0+阅读 · 2009年12月31日

可见光活性光催化生物吸附剂吸附催化机理的研究

国家自然科学基金

0+阅读 · 2008年12月31日

Unified Perception: Efficient Video Panoptic Segmentation with Minimal Annotation Costs

Arxiv

0+阅读 · 2023年3月3日

Learning Open-vocabulary Semantic Segmentation Models From Natural Language Supervision

Arxiv

0+阅读 · 2023年3月3日

Learning Common Rationale to Improve Self-Supervised Representation for Fine-Grained Visual Recognition Problems

Arxiv

0+阅读 · 2023年3月3日

CbwLoss: Constrained Bidirectional Weighted Loss for Self-supervised Learning of Depth and Pose

Arxiv

0+阅读 · 2023年3月3日

Cross-Modal Entity Matching for Visually Rich Documents

Cross-Modal Entity Matching for Visually Rich Documents

Arxiv

0+阅读 · 2023年3月1日

Correlation Information Bottleneck: Towards Adapting Pretrained Multimodal Models for Robust Visual Question Answering

Arxiv

0+阅读 · 2023年3月1日

A Survey on Vision Transformer

Arxiv

17+阅读 · 2022年2月23日

A Survey of Machine Learning for Computer Architecture and Systems

Arxiv

18+阅读 · 2021年2月16日

Pretrained Transformers for Text Ranking: BERT and Beyond

Arxiv

28+阅读 · 2020年10月13日

TinyBERT: Distilling BERT for Natural Language Understanding

TinyBERT: Distilling BERT for Natural Language Understanding

Arxiv

11+阅读 · 2019年9月23日

VIP会员

文章信息

相关主题

Processing（编程语言）

相关VIP内容

【CVPR 2022】基于Tracklet查询和建议的高效视频实例分割，Efficient Video Instance Segmentation via Tracklet Query and Proposal

【CVPR 2022】基于Tracklet查询和建议的高效视频实例分割，Efficient Video Instance Segmentation via Tracklet Query and Proposal

专知会员服务

16+阅读 · 2022年3月3日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

人工智能代理提升战时舰船战备水平

《利用虚拟现实与增强现实技术加强海港海岸线监测》报告

人工智能代理提升战时舰船战备水平

《乌克兰无人水面艇的实战应用》最新42页报告

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

机器学习研究会

20+阅读 · 2017年12月17日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

【推荐】YOLO实时目标检测(6fps)

【推荐】YOLO实时目标检测(6fps)

机器学习研究会

20+阅读 · 2017年11月5日

【推荐】深度学习目标检测概览

【推荐】深度学习目标检测概览

机器学习研究会

10+阅读 · 2017年9月1日

相关论文

Unified Perception: Efficient Video Panoptic Segmentation with Minimal Annotation Costs

Arxiv

0+阅读 · 2023年3月3日

Learning Open-vocabulary Semantic Segmentation Models From Natural Language Supervision

Arxiv

0+阅读 · 2023年3月3日

Learning Common Rationale to Improve Self-Supervised Representation for Fine-Grained Visual Recognition Problems

Arxiv

0+阅读 · 2023年3月3日

CbwLoss: Constrained Bidirectional Weighted Loss for Self-supervised Learning of Depth and Pose

Arxiv

0+阅读 · 2023年3月3日

Cross-Modal Entity Matching for Visually Rich Documents

Cross-Modal Entity Matching for Visually Rich Documents

Arxiv

0+阅读 · 2023年3月1日

Correlation Information Bottleneck: Towards Adapting Pretrained Multimodal Models for Robust Visual Question Answering

Arxiv

0+阅读 · 2023年3月1日

A Survey on Vision Transformer

Arxiv

17+阅读 · 2022年2月23日

A Survey of Machine Learning for Computer Architecture and Systems

Arxiv

18+阅读 · 2021年2月16日

Pretrained Transformers for Text Ranking: BERT and Beyond

Arxiv

28+阅读 · 2020年10月13日

TinyBERT: Distilling BERT for Natural Language Understanding

TinyBERT: Distilling BERT for Natural Language Understanding

Arxiv

11+阅读 · 2019年9月23日

相关基金

g-C3N4/Au-SnO2纳米簇复合材料的构建及可见光下光电协同催化还原CO2性能研究

国家自然科学基金

0+阅读 · 2015年12月31日

PSMA通过TRAF6和TTC3调控前列腺癌细胞自噬在CRPC产生过程中的机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

Anderson型多酸的不对称修饰及可控组装研究

国家自然科学基金

1+阅读 · 2014年12月31日

近红外p-型染料敏化剂的合成及其光解水制氢性能研究

国家自然科学基金

0+阅读 · 2014年12月31日

ROS和MAPK信号通路在镍致睾酮合成障碍中的调控机理研究

国家自然科学基金

0+阅读 · 2013年12月31日

Notch信号通路参与家蚕胚胎发育分子机制的研究

国家自然科学基金

0+阅读 · 2013年12月31日

p38MAPK信号通路在硒诱导白血病细胞自噬与凋亡中的调控作用

国家自然科学基金

0+阅读 · 2011年12月31日

铜基硫属半导体纳米材料的液相可控合成与性能研究

国家自然科学基金

0+阅读 · 2009年12月31日

自噬/溶酶体凋亡途径在胆酸诱导新生大鼠肺泡Ⅱ#22411;上皮细胞凋亡中的作用

国家自然科学基金

0+阅读 · 2009年12月31日

可见光活性光催化生物吸附剂吸附催化机理的研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员