实现人类在视觉问题回答上的平等 (Achieving Human Parity on Visual Question Answering) - 专知论文

会员服务 ·

0

视觉问答 · 自动问答 · Extensibility · MINE · INTERACT ·

2021 年 11 月 19 日

Achieving Human Parity on Visual Question Answering

翻译：实现人类在视觉问题回答上的平等

Ming Yan,Haiyang Xu,Chenliang Li,Junfeng Tian,Bin Bi,Wei Wang,Weihua Chen,Xianzhe Xu,Fan Wang,Zheng Cao,Zhicheng Zhang,Qiyu Zhang,Ji Zhang,Songfang Huang,Fei Huang,Luo Si,Rong Jin

The Visual Question Answering (VQA) task utilizes both visual image and language analysis to answer a textual question with respect to an image. It has been a popular research topic with an increasing number of real-world applications in the last decade. This paper describes our recent research of AliceMind-MMU (ALIbaba's Collection of Encoder-decoders from Machine IntelligeNce lab of Damo academy - MultiMedia Understanding) that obtains similar or even slightly better results than human being does on VQA. This is achieved by systematically improving the VQA pipeline including: (1) pre-training with comprehensive visual and textual feature representation; (2) effective cross-modal interaction with learning to attend; and (3) A novel knowledge mining framework with specialized expert modules for the complex VQA task. Treating different types of visual questions with corresponding expertise needed plays an important role in boosting the performance of our VQA architecture up to the human level. An extensive set of experiments and analysis are conducted to demonstrate the effectiveness of the new research work.

翻译：视觉问题解答(VQA)任务利用视觉图像和语言分析来解答关于图像的文字问题,在过去十年中,这是一个很受欢迎的研究课题,其实际应用数量不断增加,本文描述了我们最近对AliceMind-MMU(Allibaba's Concretain of Encoder-decoders from Machine IntelligeNce Lab of Damo Colum-Multi Media Aunication)的研究,该研究在VQA上取得了与人类相似甚至略微更好的成果。这是通过系统地改进VQA管道实现的,其中包括:(1) 接受全面的视觉和文字特征说明培训前;(2) 与学习有关的有效的跨模式互动;(3) 一个具有复杂VQA任务专门专家模块的新的知识挖掘框架。用所需的相应专门知识处理不同种类的视觉问题,在提高我们VQA结构在人类层面的绩效方面发挥着重要作用。进行了广泛的实验和分析,以证明新的研究工作的有效性。

0

相关内容

视觉问答

视觉问答（Visual Question Answering，VQA），是一种涉及计算机视觉和自然语言处理的学习任务。这一任务的定义如下： A VQA system takes as input an image and a free-form, open-ended, natural-language question about the image and produces a natural-language answer as the output[1]。翻译为中文：一个VQA系统以一张图片和一个关于这张图片形式自由、开放式的自然语言问题作为输入，以生成一条自然语言答案作为输出。简单来说，VQA就是给定的图片进行问答。

知识荟萃

精品入门和进阶教程、论文和代码整理等

更多

查看相关VIP内容、论文、资讯等

【Manning新书】微服务安全实战，616页pdf，Microservices Security in Action

【Manning新书】微服务安全实战，616页pdf，Microservices Security in Action

专知会员服务

46+阅读 · 2020年7月22日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

【IBM】在视觉和关系推理中迁移学习，Transfer Learning in Visual and Relational Reasoning

【IBM】在视觉和关系推理中迁移学习，Transfer Learning in Visual and Relational Reasoning

专知会员服务

45+阅读 · 2020年1月15日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

【CVPR 2019|workshop】视觉问答和对话，Visual Question Answering and Dialog，斯坦福大学|Christopher Manning，Google DeepMind|Karl Moritz Hermann

【CVPR 2019|workshop】视觉问答和对话，Visual Question Answering and Dialog，斯坦福大学|Christopher Manning，Google DeepMind|Karl Moritz Hermann

专知会员服务

18+阅读 · 2019年6月17日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

大神一年100篇论文

大神一年100篇论文

CreateAMind

15+阅读 · 2018年12月31日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Jointly Improving Summarization and Sentiment Classification

Jointly Improving Summarization and Sentiment Classification

黑龙江大学自然语言处理实验室

3+阅读 · 2018年6月12日

【论文推荐】最新六篇视觉问答相关论文—鲁棒性分析、虚拟意象、双曲注意力网络、R-VQA、关系推理、双线性注意力网络

【论文推荐】最新六篇视觉问答相关论文—鲁棒性分析、虚拟意象、双曲注意力网络、R-VQA、关系推理、双线性注意力网络

专知

17+阅读 · 2018年6月7日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

专知

30+阅读 · 2018年3月22日

推荐｜深度强化学习聊天机器人（附论文）！

推荐｜深度强化学习聊天机器人（附论文）！

全球人工智能

4+阅读 · 2018年1月30日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Medical Visual Question Answering: A Survey

Arxiv

15+阅读 · 2021年11月19日

Visual Question Answering using Deep Learning: A Survey and Performance Analysis

Arxiv

4+阅读 · 2019年8月27日

GQA: A New Dataset for Real-World Visual Reasoning and Compositional Question Answering

Arxiv

3+阅读 · 2019年5月10日

Visual Question Answering as Reading Comprehension

Arxiv

3+阅读 · 2018年11月29日

R-VQA: Learning Visual Relation Facts with Semantic Attention for Visual Question Answering

Arxiv

7+阅读 · 2018年5月24日

Joint Image Captioning and Question Answering

Arxiv

6+阅读 · 2018年5月22日

Reciprocal Attention Fusion for Visual Question Answering

Arxiv

5+阅读 · 2018年5月11日

Explicit Reasoning over End-to-End Neural Architectures for Visual Question Answering

Arxiv

5+阅读 · 2018年3月23日

The Web as a Knowledge-base for Answering Complex Questions

Arxiv

5+阅读 · 2018年3月18日

Hierarchical Question-Image Co-Attention for Visual Question Answering

Arxiv

3+阅读 · 2017年1月19日

VIP会员

文章信息

相关主题

相关VIP内容

【Manning新书】微服务安全实战，616页pdf，Microservices Security in Action

【Manning新书】微服务安全实战，616页pdf，Microservices Security in Action

专知会员服务

46+阅读 · 2020年7月22日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

【IBM】在视觉和关系推理中迁移学习，Transfer Learning in Visual and Relational Reasoning

【IBM】在视觉和关系推理中迁移学习，Transfer Learning in Visual and Relational Reasoning

专知会员服务

45+阅读 · 2020年1月15日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

【CVPR 2019|workshop】视觉问答和对话，Visual Question Answering and Dialog，斯坦福大学|Christopher Manning，Google DeepMind|Karl Moritz Hermann

【CVPR 2019|workshop】视觉问答和对话，Visual Question Answering and Dialog，斯坦福大学|Christopher Manning，Google DeepMind|Karl Moritz Hermann

专知会员服务

18+阅读 · 2019年6月17日

热门VIP内容

开通专知VIP会员享更多权益服务

《步兵小单元山地严寒作战指南》美军最新条令200页

《联合作战概念的发展》最新报告

俄制无人机弹药

《复杂场景下自主着陆的模型预测控制技术》92页

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

大神一年100篇论文

大神一年100篇论文

CreateAMind

15+阅读 · 2018年12月31日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Jointly Improving Summarization and Sentiment Classification

Jointly Improving Summarization and Sentiment Classification

黑龙江大学自然语言处理实验室

3+阅读 · 2018年6月12日

【论文推荐】最新六篇视觉问答相关论文—鲁棒性分析、虚拟意象、双曲注意力网络、R-VQA、关系推理、双线性注意力网络

【论文推荐】最新六篇视觉问答相关论文—鲁棒性分析、虚拟意象、双曲注意力网络、R-VQA、关系推理、双线性注意力网络

专知

17+阅读 · 2018年6月7日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

专知

30+阅读 · 2018年3月22日

推荐｜深度强化学习聊天机器人（附论文）！

推荐｜深度强化学习聊天机器人（附论文）！

全球人工智能

4+阅读 · 2018年1月30日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

相关论文

Medical Visual Question Answering: A Survey

Arxiv

15+阅读 · 2021年11月19日

Visual Question Answering using Deep Learning: A Survey and Performance Analysis

Arxiv

4+阅读 · 2019年8月27日

GQA: A New Dataset for Real-World Visual Reasoning and Compositional Question Answering

Arxiv

3+阅读 · 2019年5月10日

Visual Question Answering as Reading Comprehension

Arxiv

3+阅读 · 2018年11月29日

R-VQA: Learning Visual Relation Facts with Semantic Attention for Visual Question Answering

Arxiv

7+阅读 · 2018年5月24日

Joint Image Captioning and Question Answering

Arxiv

6+阅读 · 2018年5月22日

Reciprocal Attention Fusion for Visual Question Answering

Arxiv

5+阅读 · 2018年5月11日

Explicit Reasoning over End-to-End Neural Architectures for Visual Question Answering

Arxiv

5+阅读 · 2018年3月23日

The Web as a Knowledge-base for Answering Complex Questions

Arxiv

5+阅读 · 2018年3月18日

Hierarchical Question-Image Co-Attention for Visual Question Answering

Arxiv

3+阅读 · 2017年1月19日

微信扫码咨询专知VIP会员