以关注方式回答的指导性视觉问题 (Guiding Visual Question Answering with Attention Priors) - 专知论文

会员服务 ·

0

视觉问答 · 注意力机制 · Weight · 推断 · 自动问答 ·

2022 年 5 月 25 日

Guiding Visual Question Answering with Attention Priors

翻译：以关注方式回答的指导性视觉问题

Thao Minh Le,Vuong Le,Sunil Gupta,Svetha Venkatesh,Truyen Tran

from arxiv, Preprint, 10 pages

The current success of modern visual reasoning systems is arguably attributed to cross-modality attention mechanisms. However, in deliberative reasoning such as in VQA, attention is unconstrained at each step, and thus may serve as a statistical pooling mechanism rather than a semantic operation intended to select information relevant to inference. This is because at training time, attention is only guided by a very sparse signal (i.e. the answer label) at the end of the inference chain. This causes the cross-modality attention weights to deviate from the desired visual-language bindings. To rectify this deviation, we propose to guide the attention mechanism using explicit linguistic-visual grounding. This grounding is derived by connecting structured linguistic concepts in the query to their referents among the visual objects. Here we learn the grounding from the pairing of questions and images alone, without the need for answer annotation or external grounding supervision. This grounding guides the attention mechanism inside VQA models through a duality of mechanisms: pre-training attention weight calculation and directly guiding the weights at inference time on a case-by-case basis. The resultant algorithm is capable of probing attention-based reasoning models, injecting relevant associative knowledge, and regulating the core reasoning process. This scalable enhancement improves the performance of VQA models, fortifies their robustness to limited access to supervised data, and increases interpretability.

翻译：现代视觉推理系统目前的成功可以说可归因于交叉式关注机制。然而,在诸如《VQA》这样的审议推理中,注意力在每一步都不受限制,因此,它可能作为一个统计集合机制,而不是旨在选择与推理有关的信息的语义操作。这是因为在培训时间,注意力仅受推理链结尾处非常稀少的信号(即答案标签)指导。这导致交叉式关注权重偏离了所需的视觉语言约束。为了纠正这一偏差,我们建议使用明确的语言视觉定位来指导关注机制。这一基础是将查询中的结构性语言概念与其在视觉对象中的参照点联系起来。我们在这里学习的是将问题和图像单独配对的基础,而不必回答注释或外部定位监督。这一基础引导了VQA模型内部的注意机制的双重性:培训前重度计算,直接指导在个案基础上的推理时间的权重,我们提议采用明确的语言-视觉定位基础。这一基础是将查询中的结构语言概念与其视觉对象相连接。我们学习的是将问题和图像的推理推理。这种推推推推,能够将基于相关的推理推推推推推推,从而推推推推其相关的数据推理。

0

相关内容

视觉问答

视觉问答（Visual Question Answering，VQA），是一种涉及计算机视觉和自然语言处理的学习任务。这一任务的定义如下： A VQA system takes as input an image and a free-form, open-ended, natural-language question about the image and produces a natural-language answer as the output[1]。翻译为中文：一个VQA系统以一张图片和一个关于这张图片形式自由、开放式的自然语言问题作为输入，以生成一条自然语言答案作为输出。简单来说，VQA就是给定的图片进行问答。

知识荟萃

精品入门和进阶教程、论文和代码整理等

更多

查看相关VIP内容、论文、资讯等

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

【CVPR 2022】基于层次化视觉语言知识蒸馏的开放词汇单阶段检测，Improving Visual Grounding with Visual-Linguistic Verification and Iterative Reasoning

【CVPR 2022】基于层次化视觉语言知识蒸馏的开放词汇单阶段检测，Improving Visual Grounding with Visual-Linguistic Verification and Iterative Reasoning

专知会员服务

7+阅读 · 2022年3月19日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

专知会员服务

108+阅读 · 2020年5月1日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新七篇图像描述生成相关论文—CNN+CNN、对抗样本、显著性和上下文注意力、条件生成对抗网络、风格化

【论文推荐】最新七篇图像描述生成相关论文—CNN+CNN、对抗样本、显著性和上下文注意力、条件生成对抗网络、风格化

专知

25+阅读 · 2018年5月28日

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

专知

30+阅读 · 2018年3月22日

【论文推荐】最新六篇视觉问答（VQA）相关论文—盲人问题、物体计数、多模态解释、视觉关系、对抗性网络、对偶循环注意力

【论文推荐】最新六篇视觉问答（VQA）相关论文—盲人问题、物体计数、多模态解释、视觉关系、对抗性网络、对偶循环注意力

专知

32+阅读 · 2018年2月28日

【论文推荐】最新六篇自动问答（QA）相关论文—复杂序列问答、注意力机制、长短时记忆、文本推理、多因素注意力、主动的问答智能体

【论文推荐】最新六篇自动问答（QA）相关论文—复杂序列问答、注意力机制、长短时记忆、文本推理、多因素注意力、主动的问答智能体

专知

18+阅读 · 2018年2月22日

HOXB-AS3/HOXB7/PAK4信号轴调控结直肠癌侵袭转移的分子机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于主题模型的枢轴语言统计机器翻译研究

国家自然科学基金

0+阅读 · 2013年12月31日

电压力锅宏量制备非镉基水溶性发光量子点

国家自然科学基金

0+阅读 · 2013年12月31日

组蛋白甲基化修饰调控拟南芥冷响应基因TCF1的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

扩展的模糊逻辑与基于蕴涵算子的Rough逻辑

国家自然科学基金

0+阅读 · 2011年12月31日

Intermedin-53在心肌肥厚中的作用和机制

国家自然科学基金

0+阅读 · 2011年12月31日

Reality-based Interaction用户界面模型和评估方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

双通道EGER/CMC法及两种“#19971;药”#20013;抗癌活性成分筛选研究

国家自然科学基金

0+阅读 · 2010年12月31日

Legumain在乳腺癌骨转移和破骨损伤过程中的作用机制研究

国家自然科学基金

0+阅读 · 2009年12月31日

磁性Pickering乳液界面流变学研究

国家自然科学基金

0+阅读 · 2008年12月31日

A Novel DeBERTa-based Model for Financial Question Answering Task

Arxiv

0+阅读 · 2022年7月12日

Going the Extra Mile in Face Image Quality Assessment: A Novel Database and Model

Arxiv

0+阅读 · 2022年7月11日

OmniTab: Pretraining with Natural and Synthetic Data for Few-shot Table-based Question Answering

Arxiv

0+阅读 · 2022年7月8日

Exploring Visual Relationship for Image Captioning

Exploring Visual Relationship for Image Captioning

Arxiv

15+阅读 · 2018年9月19日

Strong Baselines for Simple Question Answering over Knowledge Graphs with and without Neural Networks

Arxiv

17+阅读 · 2018年6月5日

Question Answering over Freebase via Attentive RNN with Similarity Matrix based CNN

Arxiv

11+阅读 · 2018年5月27日

End-to-End Dense Video Captioning with Masked Transformer

Arxiv

14+阅读 · 2018年4月3日

VQA-E: Explaining, Elaborating, and Enhancing Your Answers for Visual Questions

Arxiv

17+阅读 · 2018年3月20日

Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering

Arxiv

14+阅读 · 2018年3月14日

Learning to Count Objects in Natural Images for Visual Question Answering

Arxiv

12+阅读 · 2018年2月15日

VIP会员

文章信息

相关主题

注意力机制

相关VIP内容

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

【CVPR 2022】基于层次化视觉语言知识蒸馏的开放词汇单阶段检测，Improving Visual Grounding with Visual-Linguistic Verification and Iterative Reasoning

【CVPR 2022】基于层次化视觉语言知识蒸馏的开放词汇单阶段检测，Improving Visual Grounding with Visual-Linguistic Verification and Iterative Reasoning

专知会员服务

7+阅读 · 2022年3月19日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

专知会员服务

108+阅读 · 2020年5月1日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

量子计算发展态势研究报告（2025年）

Video-LMM后训练：多模态大模型的视频推理深度解析

【CMU博士论文】用于提升含优化层学习的算法与体系结构

【NeurIPS2025】有何不同于过去？基于自监督偏差学习的时空时间序列预测

相关资讯

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新七篇图像描述生成相关论文—CNN+CNN、对抗样本、显著性和上下文注意力、条件生成对抗网络、风格化

【论文推荐】最新七篇图像描述生成相关论文—CNN+CNN、对抗样本、显著性和上下文注意力、条件生成对抗网络、风格化

专知

25+阅读 · 2018年5月28日

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

专知

30+阅读 · 2018年3月22日

【论文推荐】最新六篇视觉问答（VQA）相关论文—盲人问题、物体计数、多模态解释、视觉关系、对抗性网络、对偶循环注意力

【论文推荐】最新六篇视觉问答（VQA）相关论文—盲人问题、物体计数、多模态解释、视觉关系、对抗性网络、对偶循环注意力

专知

32+阅读 · 2018年2月28日

【论文推荐】最新六篇自动问答（QA）相关论文—复杂序列问答、注意力机制、长短时记忆、文本推理、多因素注意力、主动的问答智能体

【论文推荐】最新六篇自动问答（QA）相关论文—复杂序列问答、注意力机制、长短时记忆、文本推理、多因素注意力、主动的问答智能体

专知

18+阅读 · 2018年2月22日

相关论文

A Novel DeBERTa-based Model for Financial Question Answering Task

Arxiv

0+阅读 · 2022年7月12日

Going the Extra Mile in Face Image Quality Assessment: A Novel Database and Model

Arxiv

0+阅读 · 2022年7月11日

OmniTab: Pretraining with Natural and Synthetic Data for Few-shot Table-based Question Answering

Arxiv

0+阅读 · 2022年7月8日

Exploring Visual Relationship for Image Captioning

Exploring Visual Relationship for Image Captioning

Arxiv

15+阅读 · 2018年9月19日

Strong Baselines for Simple Question Answering over Knowledge Graphs with and without Neural Networks

Arxiv

17+阅读 · 2018年6月5日

Question Answering over Freebase via Attentive RNN with Similarity Matrix based CNN

Arxiv

11+阅读 · 2018年5月27日

End-to-End Dense Video Captioning with Masked Transformer

Arxiv

14+阅读 · 2018年4月3日

VQA-E: Explaining, Elaborating, and Enhancing Your Answers for Visual Questions

Arxiv

17+阅读 · 2018年3月20日

Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering

Arxiv

14+阅读 · 2018年3月14日

Learning to Count Objects in Natural Images for Visual Question Answering

Arxiv

12+阅读 · 2018年2月15日

相关基金

HOXB-AS3/HOXB7/PAK4信号轴调控结直肠癌侵袭转移的分子机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于主题模型的枢轴语言统计机器翻译研究

国家自然科学基金

0+阅读 · 2013年12月31日

电压力锅宏量制备非镉基水溶性发光量子点

国家自然科学基金

0+阅读 · 2013年12月31日

组蛋白甲基化修饰调控拟南芥冷响应基因TCF1的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

扩展的模糊逻辑与基于蕴涵算子的Rough逻辑

国家自然科学基金

0+阅读 · 2011年12月31日

Intermedin-53在心肌肥厚中的作用和机制

国家自然科学基金

0+阅读 · 2011年12月31日

Reality-based Interaction用户界面模型和评估方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

双通道EGER/CMC法及两种“#19971;药”#20013;抗癌活性成分筛选研究

国家自然科学基金

0+阅读 · 2010年12月31日

Legumain在乳腺癌骨转移和破骨损伤过程中的作用机制研究

国家自然科学基金

0+阅读 · 2009年12月31日

磁性Pickering乳液界面流变学研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员