GPT-3为很少光点知识的VQA进行的GPT-3实证研究 (An Empirical Study of GPT-3 for Few-Shot Knowledge-Based VQA) - 专知论文

会员服务 ·

0

知识 (knowledge) · 视觉问答 · GPT-3 · 小样本学习 · Unstructured ·

2022 年 9 月 14 日

An Empirical Study of GPT-3 for Few-Shot Knowledge-Based VQA

翻译：GPT-3为很少光点知识的VQA进行的GPT-3实证研究

Zhengyuan Yang,Zhe Gan,Jianfeng Wang,Xiaowei Hu,Yumao Lu,Zicheng Liu,Lijuan Wang

from arxiv, AAAI 2022 (Oral Presentation)

Knowledge-based visual question answering (VQA) involves answering questions that require external knowledge not present in the image. Existing methods first retrieve knowledge from external resources, then reason over the selected knowledge, the input image, and question for answer prediction. However, this two-step approach could lead to mismatches that potentially limit the VQA performance. For example, the retrieved knowledge might be noisy and irrelevant to the question, and the re-embedded knowledge features during reasoning might deviate from their original meanings in the knowledge base (KB). To address this challenge, we propose PICa, a simple yet effective method that Prompts GPT3 via the use of Image Captions, for knowledge-based VQA. Inspired by GPT-3's power in knowledge retrieval and question answering, instead of using structured KBs as in previous work, we treat GPT-3 as an implicit and unstructured KB that can jointly acquire and process relevant knowledge. Specifically, we first convert the image into captions (or tags) that GPT-3 can understand, then adapt GPT-3 to solve the VQA task in a few-shot manner by just providing a few in-context VQA examples. We further boost performance by carefully investigating: (i) what text formats best describe the image content, and (ii) how in-context examples can be better selected and used. PICa unlocks the first use of GPT-3 for multimodal tasks. By using only 16 examples, PICa surpasses the supervised state of the art by an absolute +8.6 points on the OK-VQA dataset. We also benchmark PICa on VQAv2, where PICa also shows a decent few-shot performance.

翻译：以知识为基础的视觉问题解答(VQA)涉及回答需要外部知识而不是图像中存在的问题。现有方法首先从外部资源中获取知识,然后对选定的知识、输入图像和答案预测问题进行解释。但是,这一两步方法可能导致不匹配,从而可能限制VQA的性能。例如,所获取的知识可能吵杂,与问题无关,推理过程中重新形成的知识特征可能偏离知识库(KB)的原始含义。为了应对这一挑战,我们提议PICa,这是一种简单而有效的方法,通过使用基于知识的 VQA 图像描述GPT3, 从而通过使用基于知识的 VQA 描述, 由GPT-3 在知识检索和回答方面的力量启发,而不是像以前的工作那样使用结构化的 KBSB, 我们把GPT-3作为隐含和无结构的KB; 具体地说,我们首先将图像转换为我们GPTVT-3能够理解的(或标签), 然后将GPT-3 调整GPTA 以先用几张的方式解决VA任务, QA 提高VQ 质量,然后以少数方式对成本基准进行精化分析。我们只是用一些格式的文本中的数据,, 将如何用一些格式,然后用我们用16A 样样化的文本来更精确化的文本化的文本化的文本化的文本化地展示。

0

相关内容

知识 (knowledge)

知识 (knowledge)

通过学习、实践或探索所获得的认识、判断或技能。

最新《知识图谱复杂问答》综述论文，A Survey on Complex Question Answering over Knowledge Base: Recent Advances and Challenges

最新《知识图谱复杂问答》综述论文，A Survey on Complex Question Answering over Knowledge Base: Recent Advances and Challenges

专知会员服务

73+阅读 · 2020年7月28日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

【清华大学】知识增强的常识性故事生成预训练模型，A Knowledge-Enhanced Pretraining Model for Commonsense Story Generation

【清华大学】知识增强的常识性故事生成预训练模型，A Knowledge-Enhanced Pretraining Model for Commonsense Story Generation

专知会员服务

52+阅读 · 2020年1月20日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

HNF4α-miR-541-自噬相关基因调控通路在肝癌中的作用

国家自然科学基金

0+阅读 · 2015年12月31日

脯氨酸羟化酶3调控c-Jun的作用与机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

细胞周期蛋白Cyclin G1与肿瘤分子靶向治疗诱导多倍体耐药的机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

COMPASS系统GEO卫星太阳光压模型精化技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

新型纳米多晶金属的塑性变形和断裂机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

面向属性的CPN建模及On the Fly辅助的测试生成方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

Al2O3和TiOx在CaO-CaF2-SiO2渣系的热力学研究

国家自然科学基金

0+阅读 · 2011年12月31日

NPR1基因甲基化沉默在胃癌转移和侵袭中的作用机制研究

国家自然科学基金

0+阅读 · 2010年12月31日

Trb3在内质网应激诱导舌鳞癌细胞凋亡中的作用

国家自然科学基金

0+阅读 · 2009年12月31日

螺旋锥齿轮高速干切削机理及切削/刀具参数优化

国家自然科学基金

0+阅读 · 2009年12月31日

Adapters for Enhanced Modeling of Multilingual Knowledge and Text

Arxiv

0+阅读 · 2022年10月24日

An Empirical Revisiting of Linguistic Knowledge Fusion in Language Understanding Tasks

Arxiv

0+阅读 · 2022年10月24日

Few-shot Learning with Multilingual Language Models

Arxiv

0+阅读 · 2022年10月24日

TIARA: Multi-grained Retrieval for Robust Question Answering over Large Knowledge Bases

Arxiv

0+阅读 · 2022年10月24日

Translation Word-Level Auto-Completion: What can we achieve out of the box?

Arxiv

0+阅读 · 2022年10月23日

Exploring the Limits of Domain-Adaptive Training for Detoxifying Large-Scale Language Models

Arxiv

0+阅读 · 2022年10月21日

K-AID: Enhancing Pre-trained Language Models with Domain Knowledge for Question Answering

Arxiv

15+阅读 · 2021年9月22日

Learning Neural Models for Natural Language Processing in the Face of Distributional Shift

Arxiv

11+阅读 · 2021年9月3日

AMMUS : A Survey of Transformer-based Pretrained Models in Natural Language Processing

Arxiv

23+阅读 · 2021年8月12日

Transferring Common-Sense Knowledge for Object Detection

Arxiv

12+阅读 · 2018年4月3日

VIP会员

文章信息

相关主题

知识 (knowledge)

小样本学习

相关VIP内容

最新《知识图谱复杂问答》综述论文，A Survey on Complex Question Answering over Knowledge Base: Recent Advances and Challenges

最新《知识图谱复杂问答》综述论文，A Survey on Complex Question Answering over Knowledge Base: Recent Advances and Challenges

专知会员服务

73+阅读 · 2020年7月28日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

【清华大学】知识增强的常识性故事生成预训练模型，A Knowledge-Enhanced Pretraining Model for Commonsense Story Generation

【清华大学】知识增强的常识性故事生成预训练模型，A Knowledge-Enhanced Pretraining Model for Commonsense Story Generation

专知会员服务

52+阅读 · 2020年1月20日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

新书册《几何深度学习的数学基础》

中程单向攻击无人机的战略意义：俄乌战争启示

在无标注条件下适配视觉—语言模型：全面综述

面向视觉语言模型的持续学习：遗忘之外的综述与分类体系

相关资讯

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

Adapters for Enhanced Modeling of Multilingual Knowledge and Text

Arxiv

0+阅读 · 2022年10月24日

An Empirical Revisiting of Linguistic Knowledge Fusion in Language Understanding Tasks

Arxiv

0+阅读 · 2022年10月24日

Few-shot Learning with Multilingual Language Models

Arxiv

0+阅读 · 2022年10月24日

TIARA: Multi-grained Retrieval for Robust Question Answering over Large Knowledge Bases

Arxiv

0+阅读 · 2022年10月24日

Translation Word-Level Auto-Completion: What can we achieve out of the box?

Arxiv

0+阅读 · 2022年10月23日

Exploring the Limits of Domain-Adaptive Training for Detoxifying Large-Scale Language Models

Arxiv

0+阅读 · 2022年10月21日

K-AID: Enhancing Pre-trained Language Models with Domain Knowledge for Question Answering

Arxiv

15+阅读 · 2021年9月22日

Learning Neural Models for Natural Language Processing in the Face of Distributional Shift

Arxiv

11+阅读 · 2021年9月3日

AMMUS : A Survey of Transformer-based Pretrained Models in Natural Language Processing

Arxiv

23+阅读 · 2021年8月12日

Transferring Common-Sense Knowledge for Object Detection

Arxiv

12+阅读 · 2018年4月3日

相关基金

HNF4α-miR-541-自噬相关基因调控通路在肝癌中的作用

国家自然科学基金

0+阅读 · 2015年12月31日

脯氨酸羟化酶3调控c-Jun的作用与机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

细胞周期蛋白Cyclin G1与肿瘤分子靶向治疗诱导多倍体耐药的机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

COMPASS系统GEO卫星太阳光压模型精化技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

新型纳米多晶金属的塑性变形和断裂机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

面向属性的CPN建模及On the Fly辅助的测试生成方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

Al2O3和TiOx在CaO-CaF2-SiO2渣系的热力学研究

国家自然科学基金

0+阅读 · 2011年12月31日

NPR1基因甲基化沉默在胃癌转移和侵袭中的作用机制研究

国家自然科学基金

0+阅读 · 2010年12月31日

Trb3在内质网应激诱导舌鳞癌细胞凋亡中的作用

国家自然科学基金

0+阅读 · 2009年12月31日

螺旋锥齿轮高速干切削机理及切削/刀具参数优化

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员