BERT是否“盲目”？探究视觉及语言预训练对视觉语言理解的影响 (Is BERT Blind? Exploring the Effect of Vision-and-Language Pretraining on Visual Language Understanding) - 专知论文

会员服务 ·

0

可理解性 · 多峰值 · MoDELS · NLU · BERT ·

2023 年 3 月 21 日

Is BERT Blind? Exploring the Effect of Vision-and-Language Pretraining on Visual Language Understanding

翻译：BERT是否“盲目”？探究视觉及语言预训练对视觉语言理解的影响

Morris Alper,Michael Fiman,Hadar Averbuch-Elor

from arxiv, To be presented in CVPR 2023. Project webpage: https://isbertblind.github.io/

Most humans use visual imagination to understand and reason about language, but models such as BERT reason about language using knowledge acquired during text-only pretraining. In this work, we investigate whether vision-and-language pretraining can improve performance on text-only tasks that involve implicit visual reasoning, focusing primarily on zero-shot probing methods. We propose a suite of visual language understanding (VLU) tasks for probing the visual reasoning abilities of text encoder models, as well as various non-visual natural language understanding (NLU) tasks for comparison. We also contribute a novel zero-shot knowledge probing method, Stroop probing, for applying models such as CLIP to text-only tasks without needing a prediction head such as the masked language modelling head of models like BERT. We show that SOTA multimodally trained text encoders outperform unimodally trained text encoders on the VLU tasks while being underperformed by them on the NLU tasks, lending new context to previously mixed results regarding the NLU capabilities of multimodal models. We conclude that exposure to images during pretraining affords inherent visual reasoning knowledge that is reflected in language-only tasks that require implicit visual reasoning. Our findings bear importance in the broader context of multimodal learning, providing principled guidelines for the choice of text encoders used in such contexts.

翻译：大多数人通过视觉想象来理解和推理语言，但是像BERT这样的模型通过文本唯一的预训练来理解语言。在这项研究中，我们调查视觉及语言预训练是否可以提高在只涉及隐含视觉推理的文本任务上的表现，主要关注零样本探究方法。我们提出了适用于探针视觉推理能力的文本编码器模型的视觉语言理解（VLU）任务套件，以及各种非视觉自然语言理解（NLU）任务进行比较。我们还提出了一种新颖的零样本知识探究方法，Stroop探究，用于在不需要像BERT这样的预测头的文本任务中应用像CLIP这样的模型。我们展示了SOTA多模态训练的文本编码器模型在VLU任务上优于单模训练的文本编码器模型，但在NLU任务上被它们优化，为先前混合结果有关多模态模型的NLU能力提供了新的背景。我们得出结论，在预训练过程中曝光于图像使得模型具备从内在上反映在仅涉及隐含视觉推理的语言任务中所需的视觉推理知识。我们的发现在多模态学习的更广泛背景下具有重要意义，为在这些背景下使用的文本编码器模型的选择提供了原则性指导。

0

相关内容

可理解性

阿里巴巴达摩院《从 mPLUG-Owl 浅析类GPT4模型的技术细节》

阿里巴巴达摩院《从 mPLUG-Owl 浅析类GPT4模型的技术细节》

专知会员服务

57+阅读 · 2023年5月12日

不可错过！普林斯顿陈丹琦最新《大语言模型理解》2022课程！全面讲述BERT、GPT、T5等大模型，附Slides

不可错过！普林斯顿陈丹琦最新《大语言模型理解》2022课程！全面讲述BERT、GPT、T5等大模型，附Slides

专知会员服务

141+阅读 · 2022年10月19日

【CVPR 2022】跨模态检索的协同双流视觉-语言前训练模型，COTS: Collaborative Two-Stream Vision-Language Pre-Training Model for Cross-Modal Retrieval

【CVPR 2022】跨模态检索的协同双流视觉-语言前训练模型，COTS: Collaborative Two-Stream Vision-Language Pre-Training Model for Cross-Modal Retrieval

专知会员服务

13+阅读 · 2022年3月12日

语言视觉预训练语言模型揭密，Behind the Scene: Revealing the Secrets of Pre-trained Vision-and-Language Models

语言视觉预训练语言模型揭密，Behind the Scene: Revealing the Secrets of Pre-trained Vision-and-Language Models

专知会员服务

36+阅读 · 2020年5月20日

【ACL2020-亚马逊】Transformers多分辨率和多模态语音识别，Multiresolution and Multimodal Speech Recognition with Transformers

【ACL2020-亚马逊】Transformers多分辨率和多模态语音识别，Multiresolution and Multimodal Speech Recognition with Transformers

专知会员服务

15+阅读 · 2020年5月5日

【ACL2020】对抗性文本生成，Improving Adversarial Text Generation

专知会员服务

52+阅读 · 2020年5月5日

【IBM】在视觉和关系推理中迁移学习，Transfer Learning in Visual and Relational Reasoning

【IBM】在视觉和关系推理中迁移学习，Transfer Learning in Visual and Relational Reasoning

专知会员服务

45+阅读 · 2020年1月15日

【AAAI2020】用于视觉对话中深度视觉理解的自适应双向编码模型（DualVD: An Adaptive Dual Encoding Model for Deep Visual Understanding in Visual Dialogue）, 中科院信工所于静等

【AAAI2020】用于视觉对话中深度视觉理解的自适应双向编码模型（DualVD: An Adaptive Dual Encoding Model for Deep Visual Understanding in Visual Dialogue）, 中科院信工所于静等

专知会员服务

29+阅读 · 2019年11月23日

【AAAI2020论文】关注实体以更好地理解文本（Attending to Entities for Better Text Understanding）

【AAAI2020论文】关注实体以更好地理解文本（Attending to Entities for Better Text Understanding）

专知会员服务

25+阅读 · 2019年11月15日

【Google论文】ALBERT:自我监督学习语言表达的精简BERT

【Google论文】ALBERT:自我监督学习语言表达的精简BERT

专知会员服务

24+阅读 · 2019年11月4日

不可错过！普林斯顿陈丹琦最新《大语言模型理解》2022课程！全面讲述BERT、GPT、T5等大模型，附Slides

不可错过！普林斯顿陈丹琦最新《大语言模型理解》2022课程！全面讲述BERT、GPT、T5等大模型，附Slides

专知

7+阅读 · 2022年10月19日

NAACL 2022 | FACTPEGASUS：抽象摘要的真实性感知预训练和微调

NAACL 2022 | FACTPEGASUS：抽象摘要的真实性感知预训练和微调

PaperWeekly

0+阅读 · 2022年6月1日

ACL 2019 | 多语言BERT的语言表征探索

ACL 2019 | 多语言BERT的语言表征探索

AI科技评论

21+阅读 · 2019年9月6日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

40+阅读 · 2019年6月9日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

AINLP

12+阅读 · 2018年11月1日

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

专知

66+阅读 · 2018年1月31日

可解释的CNN

可解释的CNN

CreateAMind

17+阅读 · 2017年10月5日

Mir124介导柴胡疏肝散调控抑郁症肝郁证模型海马神经可塑性的分子机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

TCDD经SSeCKS/TRAF6通路诱导星形胶质细胞激活致神经毒性的机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

环境内分泌干扰物双酚A致代谢紊乱的作用机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

强迫性人格障碍影响SSRI抗强迫疗效的神经机制

国家自然科学基金

0+阅读 · 2012年12月31日

语言理解中信息结构和情绪调控注意的神经机制

国家自然科学基金

0+阅读 · 2012年12月31日

函数域中的Vinogradov中值定理

国家自然科学基金

0+阅读 · 2012年12月31日

慢性痛导致抑郁情感障碍的神经可塑性调控机制

国家自然科学基金

0+阅读 · 2011年12月31日

视觉空间注意障碍患者功能重塑的脑机制与神经调控治疗研究

国家自然科学基金

0+阅读 · 2011年12月31日

语义计算与理解的资源共享与测评方法

国家自然科学基金

0+阅读 · 2009年12月31日

汉语句子理解中语义和句法整合的认知神经机制

国家自然科学基金

0+阅读 · 2009年12月31日

Measuring Progress in Fine-grained Vision-and-Language Understanding

Arxiv

0+阅读 · 2023年5月12日

Towards Versatile and Efficient Visual Knowledge Injection into Pre-trained Language Models with Cross-Modal Adapters

Arxiv

0+阅读 · 2023年5月12日

Exploring Zero and Few-shot Techniques for Intent Classification

Arxiv

0+阅读 · 2023年5月11日

Evaluating Open-Domain Question Answering in the Era of Large Language Models

Arxiv

0+阅读 · 2023年5月11日

AMMUS : A Survey of Transformer-based Pretrained Models in Natural Language Processing

Arxiv

23+阅读 · 2021年8月12日

Unifying Vision-and-Language Tasks via Text Generation

Arxiv

10+阅读 · 2021年2月4日

TinyBERT: Distilling BERT for Natural Language Understanding

TinyBERT: Distilling BERT for Natural Language Understanding

Arxiv

11+阅读 · 2019年9月23日

Text Generation from Knowledge Graphs with Graph Transformers

Arxiv

35+阅读 · 2019年4月4日

BERT for Joint Intent Classification and Slot Filling

Arxiv

12+阅读 · 2019年2月28日

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Arxiv

15+阅读 · 2018年10月11日

VIP会员

文章信息

相关主题

相关VIP内容

阿里巴巴达摩院《从 mPLUG-Owl 浅析类GPT4模型的技术细节》

阿里巴巴达摩院《从 mPLUG-Owl 浅析类GPT4模型的技术细节》

专知会员服务

57+阅读 · 2023年5月12日

不可错过！普林斯顿陈丹琦最新《大语言模型理解》2022课程！全面讲述BERT、GPT、T5等大模型，附Slides

不可错过！普林斯顿陈丹琦最新《大语言模型理解》2022课程！全面讲述BERT、GPT、T5等大模型，附Slides

专知会员服务

141+阅读 · 2022年10月19日

【CVPR 2022】跨模态检索的协同双流视觉-语言前训练模型，COTS: Collaborative Two-Stream Vision-Language Pre-Training Model for Cross-Modal Retrieval

【CVPR 2022】跨模态检索的协同双流视觉-语言前训练模型，COTS: Collaborative Two-Stream Vision-Language Pre-Training Model for Cross-Modal Retrieval

专知会员服务

13+阅读 · 2022年3月12日

语言视觉预训练语言模型揭密，Behind the Scene: Revealing the Secrets of Pre-trained Vision-and-Language Models

语言视觉预训练语言模型揭密，Behind the Scene: Revealing the Secrets of Pre-trained Vision-and-Language Models

专知会员服务

36+阅读 · 2020年5月20日

【ACL2020-亚马逊】Transformers多分辨率和多模态语音识别，Multiresolution and Multimodal Speech Recognition with Transformers

【ACL2020-亚马逊】Transformers多分辨率和多模态语音识别，Multiresolution and Multimodal Speech Recognition with Transformers

专知会员服务

15+阅读 · 2020年5月5日

【ACL2020】对抗性文本生成，Improving Adversarial Text Generation

专知会员服务

52+阅读 · 2020年5月5日

【IBM】在视觉和关系推理中迁移学习，Transfer Learning in Visual and Relational Reasoning

【IBM】在视觉和关系推理中迁移学习，Transfer Learning in Visual and Relational Reasoning

专知会员服务

45+阅读 · 2020年1月15日

【AAAI2020】用于视觉对话中深度视觉理解的自适应双向编码模型（DualVD: An Adaptive Dual Encoding Model for Deep Visual Understanding in Visual Dialogue）, 中科院信工所于静等

【AAAI2020】用于视觉对话中深度视觉理解的自适应双向编码模型（DualVD: An Adaptive Dual Encoding Model for Deep Visual Understanding in Visual Dialogue）, 中科院信工所于静等

专知会员服务

29+阅读 · 2019年11月23日

【AAAI2020论文】关注实体以更好地理解文本（Attending to Entities for Better Text Understanding）

【AAAI2020论文】关注实体以更好地理解文本（Attending to Entities for Better Text Understanding）

专知会员服务

25+阅读 · 2019年11月15日

【Google论文】ALBERT:自我监督学习语言表达的精简BERT

【Google论文】ALBERT:自我监督学习语言表达的精简BERT

专知会员服务

24+阅读 · 2019年11月4日

热门VIP内容

开通专知VIP会员享更多权益服务

《美国太空军系统全生命周期建模、仿真与分析效能提升方案》最新84页报告

《商用大语言模型的升级风险管理：国家安全运用》

自主人工智能：未来战争是否将是自主化的？

《从装备到文化：美陆军技术素养建设启示录》最新报告

相关资讯

不可错过！普林斯顿陈丹琦最新《大语言模型理解》2022课程！全面讲述BERT、GPT、T5等大模型，附Slides

不可错过！普林斯顿陈丹琦最新《大语言模型理解》2022课程！全面讲述BERT、GPT、T5等大模型，附Slides

专知

7+阅读 · 2022年10月19日

NAACL 2022 | FACTPEGASUS：抽象摘要的真实性感知预训练和微调

NAACL 2022 | FACTPEGASUS：抽象摘要的真实性感知预训练和微调

PaperWeekly

0+阅读 · 2022年6月1日

ACL 2019 | 多语言BERT的语言表征探索

ACL 2019 | 多语言BERT的语言表征探索

AI科技评论

21+阅读 · 2019年9月6日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

40+阅读 · 2019年6月9日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

AINLP

12+阅读 · 2018年11月1日

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

专知

66+阅读 · 2018年1月31日

可解释的CNN

可解释的CNN

CreateAMind

17+阅读 · 2017年10月5日

相关论文

Measuring Progress in Fine-grained Vision-and-Language Understanding

Arxiv

0+阅读 · 2023年5月12日

Towards Versatile and Efficient Visual Knowledge Injection into Pre-trained Language Models with Cross-Modal Adapters

Arxiv

0+阅读 · 2023年5月12日

Exploring Zero and Few-shot Techniques for Intent Classification

Arxiv

0+阅读 · 2023年5月11日

Evaluating Open-Domain Question Answering in the Era of Large Language Models

Arxiv

0+阅读 · 2023年5月11日

AMMUS : A Survey of Transformer-based Pretrained Models in Natural Language Processing

Arxiv

23+阅读 · 2021年8月12日

Unifying Vision-and-Language Tasks via Text Generation

Arxiv

10+阅读 · 2021年2月4日

TinyBERT: Distilling BERT for Natural Language Understanding

TinyBERT: Distilling BERT for Natural Language Understanding

Arxiv

11+阅读 · 2019年9月23日

Text Generation from Knowledge Graphs with Graph Transformers

Arxiv

35+阅读 · 2019年4月4日

BERT for Joint Intent Classification and Slot Filling

Arxiv

12+阅读 · 2019年2月28日

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Arxiv

15+阅读 · 2018年10月11日

相关基金

Mir124介导柴胡疏肝散调控抑郁症肝郁证模型海马神经可塑性的分子机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

TCDD经SSeCKS/TRAF6通路诱导星形胶质细胞激活致神经毒性的机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

环境内分泌干扰物双酚A致代谢紊乱的作用机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

强迫性人格障碍影响SSRI抗强迫疗效的神经机制

国家自然科学基金

0+阅读 · 2012年12月31日

语言理解中信息结构和情绪调控注意的神经机制

国家自然科学基金

0+阅读 · 2012年12月31日

函数域中的Vinogradov中值定理

国家自然科学基金

0+阅读 · 2012年12月31日

慢性痛导致抑郁情感障碍的神经可塑性调控机制

国家自然科学基金

0+阅读 · 2011年12月31日

视觉空间注意障碍患者功能重塑的脑机制与神经调控治疗研究

国家自然科学基金

0+阅读 · 2011年12月31日

语义计算与理解的资源共享与测评方法

国家自然科学基金

0+阅读 · 2009年12月31日

汉语句子理解中语义和句法整合的认知神经机制

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员