OK-VQA:视觉问题回答基准要求外部知识 (OK-VQA: A Visual Question Answering Benchmark Requiring External Knowledge) - 专知论文

会员服务 ·

0

视觉问答 · 自动问答 · 数据集 · Performer · state-of-the-art ·

2019 年 9 月 4 日

OK-VQA: A Visual Question Answering Benchmark Requiring External Knowledge

翻译：OK-VQA:视觉问题回答基准要求外部知识

Kenneth Marino,Mohammad Rastegari,Ali Farhadi,Roozbeh Mottaghi

from arxiv, CVPR 2019

Visual Question Answering (VQA) in its ideal form lets us study reasoning in the joint space of vision and language and serves as a proxy for the AI task of scene understanding. However, most VQA benchmarks to date are focused on questions such as simple counting, visual attributes, and object detection that do not require reasoning or knowledge beyond what is in the image. In this paper, we address the task of knowledge-based visual question answering and provide a benchmark, called OK-VQA, where the image content is not sufficient to answer the questions, encouraging methods that rely on external knowledge resources. Our new dataset includes more than 14,000 questions that require external knowledge to answer. We show that the performance of the state-of-the-art VQA models degrades drastically in this new setting. Our analysis shows that our knowledge-based VQA task is diverse, difficult, and large compared to previous knowledge-based VQA datasets. We hope that this dataset enables researchers to open up new avenues for research in this domain. See http://okvqa.allenai.org to download and browse the dataset.

翻译：理想的视觉问题解答(VQA)让我们研究视觉和语言共同空间的推理,并充当AI现场理解任务的代理。然而,迄今为止,大多数VQA基准都侧重于简单计数、视觉属性和物体探测等问题,而这些问题并不要求推理或了解超出图像范围。在本文中,我们处理基于知识的视觉问题解答任务,并提供一个基准,称为 OK-VQA,其中图像内容不足以回答问题,鼓励使用外部知识资源的方法。我们的新数据集包括了超过14 000个需要外部知识回答的问题。我们显示,在这一新环境中,状态的VQA模型的性能急剧退化。我们的分析表明,我们基于知识的VQA任务与以往基于知识的VQA数据集相比是多种多样、困难和巨大的。我们希望这一数据集使研究人员能够打开这个领域的新的研究途径。见http://okvqa.allenai.org,以便下载和浏览数据集。

10

相关内容

视觉问答

视觉问答（Visual Question Answering，VQA），是一种涉及计算机视觉和自然语言处理的学习任务。这一任务的定义如下： A VQA system takes as input an image and a free-form, open-ended, natural-language question about the image and produces a natural-language answer as the output[1]。翻译为中文：一个VQA系统以一张图片和一个关于这张图片形式自由、开放式的自然语言问题作为输入，以生成一条自然语言答案作为输出。简单来说，VQA就是给定的图片进行问答。

知识荟萃

精品入门和进阶教程、论文和代码整理等

更多

查看相关VIP内容、论文、资讯等

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【AAAI2020】知识增强的视觉故事，Knowledge-Enriched Visual Storytelling，科罗拉多大学博德分校| Chao Chun Hsu，中国科学院博士| Lun-Wei Ku

【AAAI2020】知识增强的视觉故事，Knowledge-Enriched Visual Storytelling，科罗拉多大学博德分校| Chao Chun Hsu，中国科学院博士| Lun-Wei Ku

专知会员服务

26+阅读 · 2019年12月5日

【北京智源大会2019】增强人类智能：从搜索引擎到智能任务助理（ Augmenting Human Intelligence: From Search Engines to Intelligent Task Assistants ）

【北京智源大会2019】增强人类智能：从搜索引擎到智能任务助理（ Augmenting Human Intelligence: From Search Engines to Intelligent Task Assistants ）

专知会员服务

20+阅读 · 2019年11月22日

Natural Language Interface to Knowledge Graph (our experience) ，加州大学圣塔芭芭拉分校严锡峰副教授，CIPS ATT 16（2019）

Natural Language Interface to Knowledge Graph (our experience) ，加州大学圣塔芭芭拉分校严锡峰副教授，CIPS ATT 16（2019）

专知会员服务

16+阅读 · 2019年10月25日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

33+阅读 · 2019年10月18日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无人机视觉挑战赛 | ICCV 2019 Workshop—VisDrone2019

无人机视觉挑战赛 | ICCV 2019 Workshop—VisDrone2019

PaperWeekly

7+阅读 · 2019年5月5日

由浅及深，细致解读图像问答 VQA 2018 Challenge 冠军模型 Pythia

由浅及深，细致解读图像问答 VQA 2018 Challenge 冠军模型 Pythia

极市平台

8+阅读 · 2019年3月13日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新七篇视觉问答（VQA）相关论文—差别注意力机制、视觉问题推理、视觉对话、数据可视化、记忆增强网络、显式推理

【论文推荐】最新七篇视觉问答（VQA）相关论文—差别注意力机制、视觉问题推理、视觉对话、数据可视化、记忆增强网络、显式推理

专知

17+阅读 · 2018年4月19日

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

专知

30+阅读 · 2018年3月22日

【论文推荐】最新6篇视觉问答（VQA）相关论文—目标推理、深度循环模型、可解释性、数据可视化、Triplet学习、基准

【论文推荐】最新6篇视觉问答（VQA）相关论文—目标推理、深度循环模型、可解释性、数据可视化、Triplet学习、基准

专知

15+阅读 · 2018年2月3日

论文浅尝 | Leveraging Knowledge Bases in LSTMs

论文浅尝 | Leveraging Knowledge Bases in LSTMs

开放知识图谱

6+阅读 · 2017年12月8日

GQA: A New Dataset for Real-World Visual Reasoning and Compositional Question Answering

Arxiv

3+阅读 · 2019年5月10日

Visual Question Answering as Reading Comprehension

Arxiv

3+阅读 · 2018年11月29日

R-VQA: Learning Visual Relation Facts with Semantic Attention for Visual Question Answering

Arxiv

7+阅读 · 2018年5月24日

Movie Question Answering: Remembering the Textual Cues for Layered Visual Contents

Arxiv

3+阅读 · 2018年4月25日

Differential Attention for Visual Question Answering

Arxiv

7+阅读 · 2018年4月1日

Explicit Reasoning over End-to-End Neural Architectures for Visual Question Answering

Arxiv

5+阅读 · 2018年3月23日

VQA-E: Explaining, Elaborating, and Enhancing Your Answers for Visual Questions

Arxiv

17+阅读 · 2018年3月20日

The Web as a Knowledge-base for Answering Complex Questions

Arxiv

5+阅读 · 2018年3月18日

iVQA: Inverse Visual Question Answering

Arxiv

5+阅读 · 2018年3月16日

VQA: Visual Question Answering

Arxiv

9+阅读 · 2016年10月27日

VIP会员

文章信息

相关主题

state-of-the-art

相关VIP内容

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【AAAI2020】知识增强的视觉故事，Knowledge-Enriched Visual Storytelling，科罗拉多大学博德分校| Chao Chun Hsu，中国科学院博士| Lun-Wei Ku

【AAAI2020】知识增强的视觉故事，Knowledge-Enriched Visual Storytelling，科罗拉多大学博德分校| Chao Chun Hsu，中国科学院博士| Lun-Wei Ku

专知会员服务

26+阅读 · 2019年12月5日

【北京智源大会2019】增强人类智能：从搜索引擎到智能任务助理（ Augmenting Human Intelligence: From Search Engines to Intelligent Task Assistants ）

【北京智源大会2019】增强人类智能：从搜索引擎到智能任务助理（ Augmenting Human Intelligence: From Search Engines to Intelligent Task Assistants ）

专知会员服务

20+阅读 · 2019年11月22日

Natural Language Interface to Knowledge Graph (our experience) ，加州大学圣塔芭芭拉分校严锡峰副教授，CIPS ATT 16（2019）

Natural Language Interface to Knowledge Graph (our experience) ，加州大学圣塔芭芭拉分校严锡峰副教授，CIPS ATT 16（2019）

专知会员服务

16+阅读 · 2019年10月25日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

33+阅读 · 2019年10月18日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《自适应训练辅助系统概念导论及其在空战指挥官加速培训中的应用》125页

《美陆军近战整合企业现代化计划（2025—2026）》最新报告

以色列-伊朗空战：短暂而激烈冲突的启示

《动态作战支援演习框架构建》80页

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无人机视觉挑战赛 | ICCV 2019 Workshop—VisDrone2019

无人机视觉挑战赛 | ICCV 2019 Workshop—VisDrone2019

PaperWeekly

7+阅读 · 2019年5月5日

由浅及深，细致解读图像问答 VQA 2018 Challenge 冠军模型 Pythia

由浅及深，细致解读图像问答 VQA 2018 Challenge 冠军模型 Pythia

极市平台

8+阅读 · 2019年3月13日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新七篇视觉问答（VQA）相关论文—差别注意力机制、视觉问题推理、视觉对话、数据可视化、记忆增强网络、显式推理

【论文推荐】最新七篇视觉问答（VQA）相关论文—差别注意力机制、视觉问题推理、视觉对话、数据可视化、记忆增强网络、显式推理

专知

17+阅读 · 2018年4月19日

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

专知

30+阅读 · 2018年3月22日

【论文推荐】最新6篇视觉问答（VQA）相关论文—目标推理、深度循环模型、可解释性、数据可视化、Triplet学习、基准

【论文推荐】最新6篇视觉问答（VQA）相关论文—目标推理、深度循环模型、可解释性、数据可视化、Triplet学习、基准

专知

15+阅读 · 2018年2月3日

论文浅尝 | Leveraging Knowledge Bases in LSTMs

论文浅尝 | Leveraging Knowledge Bases in LSTMs

开放知识图谱

6+阅读 · 2017年12月8日

相关论文

GQA: A New Dataset for Real-World Visual Reasoning and Compositional Question Answering

Arxiv

3+阅读 · 2019年5月10日

Visual Question Answering as Reading Comprehension

Arxiv

3+阅读 · 2018年11月29日

R-VQA: Learning Visual Relation Facts with Semantic Attention for Visual Question Answering

Arxiv

7+阅读 · 2018年5月24日

Movie Question Answering: Remembering the Textual Cues for Layered Visual Contents

Arxiv

3+阅读 · 2018年4月25日

Differential Attention for Visual Question Answering

Arxiv

7+阅读 · 2018年4月1日

Explicit Reasoning over End-to-End Neural Architectures for Visual Question Answering

Arxiv

5+阅读 · 2018年3月23日

VQA-E: Explaining, Elaborating, and Enhancing Your Answers for Visual Questions

Arxiv

17+阅读 · 2018年3月20日

The Web as a Knowledge-base for Answering Complex Questions

Arxiv

5+阅读 · 2018年3月18日

iVQA: Inverse Visual Question Answering

Arxiv

5+阅读 · 2018年3月16日

VQA: Visual Question Answering

Arxiv

9+阅读 · 2016年10月27日

微信扫码咨询专知VIP会员