视觉常识理由的联合答复和解释 (Joint Answering and Explanation for Visual Commonsense Reasoning) - 专知论文

会员服务 ·

0

Performer · 自动问答 · 视觉问答 · Processing（编程语言） · 推断 ·

2023 年 1 月 12 日

Joint Answering and Explanation for Visual Commonsense Reasoning

翻译：视觉常识理由的联合答复和解释

Zhenyang Li,Yangyang Guo,Kejie Wang,Yinwei Wei,Liqiang Nie,Mohan Kankanhalli

Visual Commonsense Reasoning (VCR), deemed as one challenging extension of the Visual Question Answering (VQA), endeavors to pursue a more high-level visual comprehension. It is composed of two indispensable processes: question answering over a given image and rationale inference for answer explanation. Over the years, a variety of methods tackling VCR have advanced the performance on the benchmark dataset. Despite significant as these methods are, they often treat the two processes in a separate manner and hence decompose the VCR into two irrelevant VQA instances. As a result, the pivotal connection between question answering and rationale inference is interrupted, rendering existing efforts less faithful on visual reasoning. To empirically study this issue, we perform some in-depth explorations in terms of both language shortcuts and generalization capability to verify the pitfalls of this treatment. Based on our findings, in this paper, we present a plug-and-play knowledge distillation enhanced framework to couple the question answering and rationale inference processes. The key contribution is the introduction of a novel branch, which serves as the bridge to conduct processes connecting. Given that our framework is model-agnostic, we apply it to the existing popular baselines and validate its effectiveness on the benchmark dataset. As detailed in the experimental results, when equipped with our framework, these baselines achieve consistent and significant performance improvements, demonstrating the viability of processes coupling, as well as the superiority of the proposed framework.

翻译：视觉常识解释(VCR)被认为是视觉问答(VQA)的一个具有挑战性的延伸,被认为是视觉问答(VQA)的一个具有挑战性的延伸,旨在追求更高级别的视觉理解,它由两个不可或缺的过程组成:对特定图像的回答问题和答案解释的理由推论。多年来,处理VCR的各种方法提高了基准数据集的性能。尽管这些方法很重要,但它们往往以不同的方式对待这两个过程,从而将VCR分解成两个无关的VQA案例。因此,问题回答和理由推论之间的关键联系中断,使现有的努力不那么忠实于视觉推理。为实证研究这一问题,我们在语言捷径和一般化两方面都进行了一些深入的探索,以核实这一处理的缺陷。根据我们的研究结果,我们提出了一个插接和播放知识的强化框架,将问题的回答和推论进程结合起来。因此,关键的贡献是引入一个新分支,作为将现有的优越性推论进程连接起来的桥梁。为了经验研究研究,我们的框架是,在具有一定的模型和精确性的基准下,我们的框架是将现有的精确性基准和精确性地证明了这些基准。

1

相关内容

Performer

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【北京大学】探索提取跨模态信息进行图像caption，Exploring and Distilling Cross-Modal Information for Image Captioning

【北京大学】探索提取跨模态信息进行图像caption，Exploring and Distilling Cross-Modal Information for Image Captioning

专知会员服务

54+阅读 · 2020年3月3日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

III型AtOFPs转录因子对拟南芥荚果形态的调控

国家自然科学基金

0+阅读 · 2014年12月31日

脱落酸对干旱条件下马铃薯块茎形成的调控机制

国家自然科学基金

0+阅读 · 2014年12月31日

三七生长发育和品质形成对干旱胁迫的响应机理及其调控机制

国家自然科学基金

0+阅读 · 2012年12月31日

缺锌胁迫下生长素和活性氧对玉米侧根生长发育的调控机制

国家自然科学基金

0+阅读 · 2012年12月31日

纳秒脉冲射流阵列放电及提高绝缘沿面耐压研究

国家自然科学基金

0+阅读 · 2012年12月31日

云计算环境下数据中心的power capping关键问题研究

国家自然科学基金

0+阅读 · 2012年12月31日

晶体拉曼放大器的理论与实验研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于联合决策与估计的高频超视距雷达信息处理与融合

国家自然科学基金

3+阅读 · 2011年12月31日

复杂气氛环境下金属氧化物高温催化脱除HCN的机理研究

国家自然科学基金

0+阅读 · 2009年12月31日

基于PDMS微流控芯片的表面生物功能化研究

国家自然科学基金

0+阅读 · 2008年12月31日

Visual Abstraction and Reasoning through Language

Visual Abstraction and Reasoning through Language

Arxiv

0+阅读 · 2023年3月7日

VTQA: Visual Text Question Answering via Entity Alignment and Cross-Media Reasoning

Arxiv

0+阅读 · 2023年3月5日

Knowledge-Based Counterfactual Queries for Visual Question Answering

Arxiv

0+阅读 · 2023年3月5日

Prompting Large Language Models with Answer Heuristics for Knowledge-based Visual Question Answering

Arxiv

5+阅读 · 2023年3月3日

A Survey of Visual Transformers

Arxiv

39+阅读 · 2021年11月11日

QA-GNN: Reasoning with Language Models and Knowledge Graphs for Question Answering

Arxiv

20+阅读 · 2021年5月27日

A Hierarchical Reasoning Graph Neural Network for The Automatic Scoring of Answer Transcriptions in Video Job Interviews

A Hierarchical Reasoning Graph Neural Network for The Automatic Scoring of Answer Transcriptions in Video Job Interviews

Arxiv

14+阅读 · 2020年12月22日

Multi-Modal Graph Neural Network for Joint Reasoning on Vision and Scene Text

Multi-Modal Graph Neural Network for Joint Reasoning on Vision and Scene Text

Arxiv

10+阅读 · 2020年3月31日

VQA-E: Explaining, Elaborating, and Enhancing Your Answers for Visual Questions

Arxiv

17+阅读 · 2018年3月20日

Learning to Count Objects in Natural Images for Visual Question Answering

Arxiv

12+阅读 · 2018年2月15日

VIP会员

文章信息

相关主题

Processing（编程语言）

相关VIP内容

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【北京大学】探索提取跨模态信息进行图像caption，Exploring and Distilling Cross-Modal Information for Image Captioning

【北京大学】探索提取跨模态信息进行图像caption，Exploring and Distilling Cross-Modal Information for Image Captioning

专知会员服务

54+阅读 · 2020年3月3日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【博士论文】扩展可扩展会话推荐的边界

别想太多：高效 R1 风格大型推理模型综述

【ACMMM2025】EvoVLMA: 进化式视觉-语言模型自适应

智能体网络：用AI智能体编织下一代网络

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

Visual Abstraction and Reasoning through Language

Visual Abstraction and Reasoning through Language

Arxiv

0+阅读 · 2023年3月7日

VTQA: Visual Text Question Answering via Entity Alignment and Cross-Media Reasoning

Arxiv

0+阅读 · 2023年3月5日

Knowledge-Based Counterfactual Queries for Visual Question Answering

Arxiv

0+阅读 · 2023年3月5日

Prompting Large Language Models with Answer Heuristics for Knowledge-based Visual Question Answering

Arxiv

5+阅读 · 2023年3月3日

A Survey of Visual Transformers

Arxiv

39+阅读 · 2021年11月11日

QA-GNN: Reasoning with Language Models and Knowledge Graphs for Question Answering

Arxiv

20+阅读 · 2021年5月27日

A Hierarchical Reasoning Graph Neural Network for The Automatic Scoring of Answer Transcriptions in Video Job Interviews

A Hierarchical Reasoning Graph Neural Network for The Automatic Scoring of Answer Transcriptions in Video Job Interviews

Arxiv

14+阅读 · 2020年12月22日

Multi-Modal Graph Neural Network for Joint Reasoning on Vision and Scene Text

Multi-Modal Graph Neural Network for Joint Reasoning on Vision and Scene Text

Arxiv

10+阅读 · 2020年3月31日

VQA-E: Explaining, Elaborating, and Enhancing Your Answers for Visual Questions

Arxiv

17+阅读 · 2018年3月20日

Learning to Count Objects in Natural Images for Visual Question Answering

Arxiv

12+阅读 · 2018年2月15日

相关基金

III型AtOFPs转录因子对拟南芥荚果形态的调控

国家自然科学基金

0+阅读 · 2014年12月31日

脱落酸对干旱条件下马铃薯块茎形成的调控机制

国家自然科学基金

0+阅读 · 2014年12月31日

三七生长发育和品质形成对干旱胁迫的响应机理及其调控机制

国家自然科学基金

0+阅读 · 2012年12月31日

缺锌胁迫下生长素和活性氧对玉米侧根生长发育的调控机制

国家自然科学基金

0+阅读 · 2012年12月31日

纳秒脉冲射流阵列放电及提高绝缘沿面耐压研究

国家自然科学基金

0+阅读 · 2012年12月31日

云计算环境下数据中心的power capping关键问题研究

国家自然科学基金

0+阅读 · 2012年12月31日

晶体拉曼放大器的理论与实验研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于联合决策与估计的高频超视距雷达信息处理与融合

国家自然科学基金

3+阅读 · 2011年12月31日

复杂气氛环境下金属氧化物高温催化脱除HCN的机理研究

国家自然科学基金

0+阅读 · 2009年12月31日

基于PDMS微流控芯片的表面生物功能化研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员