多模式代表性学习的蒙面愿景和语言模式 (Masked Vision and Language Modeling for Multi-modal Representation Learning) - 专知论文

会员服务 ·

0

掩码 · Learning · 语言模型化 · Vision · 模态 ·

2022 年 8 月 3 日

Masked Vision and Language Modeling for Multi-modal Representation Learning

翻译：多模式代表性学习的蒙面愿景和语言模式

Gukyeong Kwon,Zhaowei Cai,Avinash Ravichandran,Erhan Bas,Rahul Bhotika,Stefano Soatto

In this paper, we study how to use masked signal modeling in vision and language (V+L) representation learning. Instead of developing masked language modeling (MLM) and masked image modeling (MIM) independently, we propose to build joint masked vision and language modeling, where the masked signal of one modality is reconstructed with the help from another modality. This is motivated by the nature of image-text paired data that both of the image and the text convey almost the same information but in different formats. The masked signal reconstruction of one modality conditioned on another modality can also implicitly learn cross-modal alignment between language tokens and image patches. Our experiments on various V+L tasks show that the proposed method not only achieves state-of-the-art performances by using a large amount of data, but also outperforms the other competitors by a significant margin in the regimes of limited training data.

翻译：在本文中,我们研究如何在视觉和语言(V+L)代表式学习中使用隐形信号模型。我们建议建立共同隐型愿景和隐型图像模型(MIM),在另一种模式的帮助下重建一种模式的隐型信号。这受图像和文字以不同格式传递几乎相同信息图像和文字的图像文本配对数据的性质的驱动。一种模式的蒙面信号重建以另一种模式为条件,也可以隐含地学习语言符号和图像补丁之间的跨模式对齐。我们在各种隐型语言符号和图像补对的实验显示,拟议方法不仅通过使用大量数据实现最先进的性能,而且还在有限的培训数据制度中大大超越其他竞争者。

0

相关内容

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

ExBert — 可视化分析Transformer学到的表示

ExBert — 可视化分析Transformer学到的表示

专知会员服务

32+阅读 · 2019年10月16日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

空间插值的微分几何方法研究

国家自然科学基金

0+阅读 · 2014年12月31日

Anderson型多酸的不对称修饰及可控组装研究

国家自然科学基金

1+阅读 · 2014年12月31日

基于单语语料的无监督统计机器翻译模型研究

国家自然科学基金

1+阅读 · 2013年12月31日

碳纳米材料参与的液相化学发光及其在生物传感中的应用研究

国家自然科学基金

0+阅读 · 2012年12月31日

强场中高激发态原子、分子光电离显微的研究

国家自然科学基金

0+阅读 · 2012年12月31日

Witten Laplacian的特征值及与其相关的Ricci Soliton研究

国家自然科学基金

0+阅读 · 2012年12月31日

低温等离子体物理及应用战略研讨

国家自然科学基金

0+阅读 · 2012年8月31日

UGT基因簇进化及调控研究

国家自然科学基金

0+阅读 · 2009年12月31日

大数据量空间信息实时传输和三维可视化的技术与方法

国家自然科学基金

0+阅读 · 2009年12月31日

碱性离子液体的分子设计、碱性表征及碱性序列构建

国家自然科学基金

0+阅读 · 2008年12月31日

Masked Supervised Learning for Semantic Segmentation

Arxiv

0+阅读 · 2022年10月3日

BEiT v2: Masked Image Modeling with Vector-Quantized Visual Tokenizers

Arxiv

1+阅读 · 2022年10月3日

Compositional Generalization in Unsupervised Compositional Representation Learning: A Study on Disentanglement and Emergent Language

Arxiv

0+阅读 · 2022年10月2日

On The Robustness of Self-Supervised Representations for Spoken Language Modeling

Arxiv

0+阅读 · 2022年9月30日

Architecture-Agnostic Masked Image Modeling -- From ViT back to CNN

Arxiv

0+阅读 · 2022年9月29日

Boosting Discriminative Visual Representation Learning with Scenario-Agnostic Mixup

Arxiv

0+阅读 · 2022年9月29日

Pix2seq: A Language Modeling Framework for Object Detection

Arxiv

10+阅读 · 2021年9月22日

K-AID: Enhancing Pre-trained Language Models with Domain Knowledge for Question Answering

Arxiv

15+阅读 · 2021年9月22日

Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning

Arxiv

13+阅读 · 2021年4月7日

ERNIE-ViL: Knowledge Enhanced Vision-Language Representations Through Scene Graph

Arxiv

11+阅读 · 2020年7月31日

VIP会员

文章信息

相关主题

语言模型化

相关VIP内容

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

ExBert — 可视化分析Transformer学到的表示

ExBert — 可视化分析Transformer学到的表示

专知会员服务

32+阅读 · 2019年10月16日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

联邦图学习的全面数据中心化综述

基于脉冲神经网络的边缘智能

LaCache：用于高效长上下文建模的大语言模型梯状KV缓存机制

【CMU博士论文】可解释的图与时间序列挖掘：算法与应用

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

相关论文

Masked Supervised Learning for Semantic Segmentation

Arxiv

0+阅读 · 2022年10月3日

BEiT v2: Masked Image Modeling with Vector-Quantized Visual Tokenizers

Arxiv

1+阅读 · 2022年10月3日

Compositional Generalization in Unsupervised Compositional Representation Learning: A Study on Disentanglement and Emergent Language

Arxiv

0+阅读 · 2022年10月2日

On The Robustness of Self-Supervised Representations for Spoken Language Modeling

Arxiv

0+阅读 · 2022年9月30日

Architecture-Agnostic Masked Image Modeling -- From ViT back to CNN

Arxiv

0+阅读 · 2022年9月29日

Boosting Discriminative Visual Representation Learning with Scenario-Agnostic Mixup

Arxiv

0+阅读 · 2022年9月29日

Pix2seq: A Language Modeling Framework for Object Detection

Arxiv

10+阅读 · 2021年9月22日

K-AID: Enhancing Pre-trained Language Models with Domain Knowledge for Question Answering

Arxiv

15+阅读 · 2021年9月22日

Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning

Arxiv

13+阅读 · 2021年4月7日

ERNIE-ViL: Knowledge Enhanced Vision-Language Representations Through Scene Graph

Arxiv

11+阅读 · 2020年7月31日

相关基金

空间插值的微分几何方法研究

国家自然科学基金

0+阅读 · 2014年12月31日

Anderson型多酸的不对称修饰及可控组装研究

国家自然科学基金

1+阅读 · 2014年12月31日

基于单语语料的无监督统计机器翻译模型研究

国家自然科学基金

1+阅读 · 2013年12月31日

碳纳米材料参与的液相化学发光及其在生物传感中的应用研究

国家自然科学基金

0+阅读 · 2012年12月31日

强场中高激发态原子、分子光电离显微的研究

国家自然科学基金

0+阅读 · 2012年12月31日

Witten Laplacian的特征值及与其相关的Ricci Soliton研究

国家自然科学基金

0+阅读 · 2012年12月31日

低温等离子体物理及应用战略研讨

国家自然科学基金

0+阅读 · 2012年8月31日

UGT基因簇进化及调控研究

国家自然科学基金

0+阅读 · 2009年12月31日

大数据量空间信息实时传输和三维可视化的技术与方法

国家自然科学基金

0+阅读 · 2009年12月31日

碱性离子液体的分子设计、碱性表征及碱性序列构建

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员