mPLUG:通过跨模式跳过连接,有成效、高效率地进行愿景-语言学习 (mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections) - 专知论文

会员服务 ·

0

值域 · MoDELS · 学成 · 图像字幕 · INFORMS ·

2022 年 5 月 24 日

mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections

翻译：mPLUG:通过跨模式跳过连接,有成效、高效率地进行愿景-语言学习

Chenliang Li,Haiyang Xu,Junfeng Tian,Wei Wang,Ming Yan,Bin Bi,Jiabo Ye,Hehong Chen,Guohai Xu,Zheng Cao,Ji Zhang,Songfang Huang,Fei Huang,Jingren Zhou

Large-scale pretrained foundation models have been an emerging paradigm for building artificial intelligence (AI) systems, which can be quickly adapted to a wide range of downstream tasks. This paper presents mPLUG, a new vision-language foundation model for both cross-modal understanding and generation. Most existing pre-trained models suffer from the problems of low computational efficiency and information asymmetry brought by the long visual sequence in cross-modal alignment. To address these problems, mPLUG introduces an effective and efficient vision-language architecture with novel cross-modal skip-connections, which creates inter-layer shortcuts that skip a certain number of layers for time-consuming full self-attention on the vision side. mPLUG is pre-trained end-to-end on large-scale image-text pairs with both discriminative and generative objectives. It achieves state-of-the-art results on a wide range of vision-language downstream tasks, such as image captioning, image-text retrieval, visual grounding and visual question answering. mPLUG also demonstrates strong zero-shot transferability when directly transferred to multiple video-language tasks.

翻译：大规模预先培训的基础模型是建立人工智能系统的新兴范例,可以迅速适应广泛的下游任务,本文件介绍了MPLUG,这是跨模式理解和生成的一个新的视觉语言基础模型;大多数经过培训的现有模型都存在由于跨模式一致的长视序列带来的计算效率低和信息不对称问题;为解决这些问题,MPLUG引入了一个具有新颖跨模式跳动功能的高效视觉语言结构,从而创建了跨层次捷径,跳过一定的层,供视觉一侧完全自省使用。MPLUG是经过事先培训的大型图像配对端对端端端到端,既具有歧视性,又具有基因化目标。它实现了关于广泛视觉语言下游任务的最新结果,例如图像字幕、图像文本检索、视觉地面和视觉回答。 mPLUG还表明在直接转移到多种视频语言任务时,零点可转移性很强。

1

相关内容

ICLR 2021杰出论文奖出炉，8篇论文上榜！

专知会员服务

26+阅读 · 2021年4月2日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

e-Learner认知效率建模及自适应调整方法研究

国家自然科学基金

0+阅读 · 2014年12月31日

自仿集合在小波分析和Fuglede谱集猜想中的应用

国家自然科学基金

0+阅读 · 2013年12月31日

控制系统的开环特性与耦合系统之间的关系

国家自然科学基金

1+阅读 · 2013年12月31日

MITA/MRP选择性剪切机制与天然免疫关系的研究

国家自然科学基金

0+阅读 · 2013年12月31日

幂零李群上热核估计的几个问题

国家自然科学基金

0+阅读 · 2012年12月31日

铈酸钡结构调控以及电学机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

非局部模型的自适应算法研究

国家自然科学基金

0+阅读 · 2012年12月31日

沉默基因Siglec-5的再激活及其抑制T细胞活化的分子机制

国家自然科学基金

0+阅读 · 2011年12月31日

UGT基因簇进化及调控研究

国家自然科学基金

0+阅读 · 2009年12月31日

Web Service QoS的多维多尺度模型及评估、预测方法的研究

国家自然科学基金

1+阅读 · 2008年12月31日

WavThruVec: Latent speech representation as intermediate features for neural speech synthesis

Arxiv

0+阅读 · 2022年7月11日

LM-Nav: Robotic Navigation with Large Pre-Trained Models of Language, Vision, and Action

Arxiv

0+阅读 · 2022年7月10日

QKVA grid: Attention in Image Perspective and Stacked DETR

Arxiv

0+阅读 · 2022年7月9日

ReFinED: An Efficient Zero-shot-capable Approach to End-to-End Entity Linking

Arxiv

0+阅读 · 2022年7月8日

The Elements of Temporal Sentence Grounding in Videos: A Survey and Future Directions

Arxiv

14+阅读 · 2022年1月20日

Pre-Trained Models: Past, Present and Future

Arxiv

19+阅读 · 2021年6月15日

Cross-Modal Discrete Representation Learning

Arxiv

18+阅读 · 2021年6月10日

Temporal Relational Modeling with Self-Supervision for Action Segmentation

Arxiv

13+阅读 · 2020年12月14日

Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering

Arxiv

14+阅读 · 2018年3月14日

The Unreasonable Effectiveness of Deep Features as a Perceptual Metric

Arxiv

11+阅读 · 2018年1月11日

VIP会员

文章信息

相关主题

相关VIP内容

ICLR 2021杰出论文奖出炉，8篇论文上榜！

专知会员服务

26+阅读 · 2021年4月2日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【CMU博士论文】基础模型训练中网络规模数据的负责任与高效使用

《俄乌战争背景下俄罗斯的战略性海军分析（2022-2025年）》最新100页报告

人工智能时代背景下的未来海战

相关资讯

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

WavThruVec: Latent speech representation as intermediate features for neural speech synthesis

Arxiv

0+阅读 · 2022年7月11日

LM-Nav: Robotic Navigation with Large Pre-Trained Models of Language, Vision, and Action

Arxiv

0+阅读 · 2022年7月10日

QKVA grid: Attention in Image Perspective and Stacked DETR

Arxiv

0+阅读 · 2022年7月9日

ReFinED: An Efficient Zero-shot-capable Approach to End-to-End Entity Linking

Arxiv

0+阅读 · 2022年7月8日

The Elements of Temporal Sentence Grounding in Videos: A Survey and Future Directions

Arxiv

14+阅读 · 2022年1月20日

Pre-Trained Models: Past, Present and Future

Arxiv

19+阅读 · 2021年6月15日

Cross-Modal Discrete Representation Learning

Arxiv

18+阅读 · 2021年6月10日

Temporal Relational Modeling with Self-Supervision for Action Segmentation

Arxiv

13+阅读 · 2020年12月14日

Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering

Arxiv

14+阅读 · 2018年3月14日

The Unreasonable Effectiveness of Deep Features as a Perceptual Metric

Arxiv

11+阅读 · 2018年1月11日

相关基金

e-Learner认知效率建模及自适应调整方法研究

国家自然科学基金

0+阅读 · 2014年12月31日

自仿集合在小波分析和Fuglede谱集猜想中的应用

国家自然科学基金

0+阅读 · 2013年12月31日

控制系统的开环特性与耦合系统之间的关系

国家自然科学基金

1+阅读 · 2013年12月31日

MITA/MRP选择性剪切机制与天然免疫关系的研究

国家自然科学基金

0+阅读 · 2013年12月31日

幂零李群上热核估计的几个问题

国家自然科学基金

0+阅读 · 2012年12月31日

铈酸钡结构调控以及电学机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

非局部模型的自适应算法研究

国家自然科学基金

0+阅读 · 2012年12月31日

沉默基因Siglec-5的再激活及其抑制T细胞活化的分子机制

国家自然科学基金

0+阅读 · 2011年12月31日

UGT基因簇进化及调控研究

国家自然科学基金

0+阅读 · 2009年12月31日

Web Service QoS的多维多尺度模型及评估、预测方法的研究

国家自然科学基金

1+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员