外语图像:BeiT 为所有愿景和愿景-语言任务提供BeiT预培训 (Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks) - 专知论文

会员服务 ·

0

Performer · BEiT · Vision · 多峰值 · state-of-the-art ·

2022 年 8 月 31 日

Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks

翻译：外语图像:BeiT 为所有愿景和愿景-语言任务提供BeiT预培训

Wenhui Wang,Hangbo Bao,Li Dong,Johan Bjorck,Zhiliang Peng,Qiang Liu,Kriti Aggarwal,Owais Khan Mohammed,Saksham Singhal,Subhojit Som,Furu Wei

from arxiv, 18 pages

A big convergence of language, vision, and multimodal pretraining is emerging. In this work, we introduce a general-purpose multimodal foundation model BEiT-3, which achieves state-of-the-art transfer performance on both vision and vision-language tasks. Specifically, we advance the big convergence from three aspects: backbone architecture, pretraining task, and model scaling up. We introduce Multiway Transformers for general-purpose modeling, where the modular architecture enables both deep fusion and modality-specific encoding. Based on the shared backbone, we perform masked "language" modeling on images (Imglish), texts (English), and image-text pairs ("parallel sentences") in a unified manner. Experimental results show that BEiT-3 obtains state-of-the-art performance on object detection (COCO), semantic segmentation (ADE20K), image classification (ImageNet), visual reasoning (NLVR2), visual question answering (VQAv2), image captioning (COCO), and cross-modal retrieval (Flickr30K, COCO).

翻译：语言、视觉和多式预设培训正在形成一种巨大的融合。在这项工作中,我们引入了一个通用的多式联运基础模型BeiT-3, 该模型在视觉和视觉语言任务上都实现了最先进的传输业绩。具体地说,我们从三个方面推进了巨大的融合: 骨干结构、预培训任务和模型升级。我们引入了用于通用模型的多路变异器, 模块结构既能进行深度融合,也能进行特定模式的编码。在共享的骨干上, 我们以统一的方式对图像( Imglish) 、文本( 英文) 和图像文本配对( “ 平行句 ” ) 进行蒙面的“ 语言” 建模。实验结果显示, BeiT-3 在对象探测( CO)、语系分解( ADE20K)、图像分类( ImagiNet)、视觉推理( NLVR2)、视觉问题解( VQAv2)、图像字幕( CO) 和跨模式检索( Flickr30K, CO) 。

0

相关内容

Performer

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【CVPR 2022】多模态视频字幕的端到端生成预训练，End-to-end Generative Pretraining for Multimodal Video Captioning

【CVPR 2022】多模态视频字幕的端到端生成预训练，End-to-end Generative Pretraining for Multimodal Video Captioning

专知会员服务

27+阅读 · 2022年3月3日

对比学习简述

专知会员服务

90+阅读 · 2021年6月29日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【课程】Andrew Ng与Google Brain团队联合出品《TensorFlow in Practice 》

【课程】Andrew Ng与Google Brain团队联合出品《TensorFlow in Practice 》

专知会员服务

13+阅读 · 2019年10月29日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

心之所向的无尽蓝，vivo S12 Pro「屿蓝」

心之所向的无尽蓝，vivo S12 Pro「屿蓝」

ZEALER订阅号

0+阅读 · 2022年1月27日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

20+阅读 · 2017年12月17日

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

机器学习研究会

20+阅读 · 2017年12月17日

TRAIL诱骗受体DcR2介导糖尿病肾病衰老肾小管上皮细胞凋亡逃逸的作用及机制

国家自然科学基金

0+阅读 · 2014年12月31日

Resveratrol联合MSCs移植对阿尔茨海默鼠的干预效果及Sirt1分子信号的介导作用

国家自然科学基金

0+阅读 · 2014年12月31日

衰老小鼠线粒体促凋亡蛋白Omi/HtrA2表达增加在加重帕金森病中的作用及机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

RegIII信号通路与SOCS3甲基化协同调控胰腺炎症恶性转化的分子机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

HCV下调SIRT1信号通路导致肝细胞糖脂代谢紊乱及其在进展性慢性肝病中的作用

国家自然科学基金

0+阅读 · 2012年12月31日

特征值与图的结构

国家自然科学基金

0+阅读 · 2012年12月31日

高效ⅤB /ⅡB族复合光催化剂分级结构的构筑及光生载流子传输机制

国家自然科学基金

0+阅读 · 2012年12月31日

超临界CO2/有机物分子间相互作用动态诱导的光谱特性及应用

国家自然科学基金

0+阅读 · 2011年12月31日

Notch受体在浸润性膀胱癌中异常活化的分子机制及生物学效应研究

国家自然科学基金

0+阅读 · 2009年12月31日

甘薯AGPase基因TRAP分子标记筛选及高淀粉育种新策略研究

国家自然科学基金

0+阅读 · 2008年12月31日

Deep Bidirectional Language-Knowledge Graph Pretraining

Arxiv

2+阅读 · 2022年10月17日

Imagic: Text-Based Real Image Editing with Diffusion Models

Arxiv

0+阅读 · 2022年10月17日

Vision-Language Pre-training: Basics, Recent Advances, and Future Trends

Arxiv

28+阅读 · 2022年10月17日

Contrastive Language-Image Pre-Training with Knowledge Graphs

Arxiv

0+阅读 · 2022年10月17日

Green Hierarchical Vision Transformer for Masked Image Modeling

Arxiv

0+阅读 · 2022年10月14日

Transformers Meet Visual Learning Understanding: A Comprehensive Review

Arxiv

28+阅读 · 2022年3月24日

Transformers in Time Series: A Survey

Arxiv

34+阅读 · 2022年2月15日

A Survey of Visual Transformers

Arxiv

39+阅读 · 2021年11月11日

From Show to Tell: A Survey on Image Captioning

Arxiv

15+阅读 · 2021年7月14日

XLNet: Generalized Autoregressive Pretraining for Language Understanding

Arxiv

14+阅读 · 2019年6月19日

VIP会员

文章信息

相关主题

state-of-the-art

相关VIP内容

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【CVPR 2022】多模态视频字幕的端到端生成预训练，End-to-end Generative Pretraining for Multimodal Video Captioning

【CVPR 2022】多模态视频字幕的端到端生成预训练，End-to-end Generative Pretraining for Multimodal Video Captioning

专知会员服务

27+阅读 · 2022年3月3日

对比学习简述

专知会员服务

90+阅读 · 2021年6月29日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【课程】Andrew Ng与Google Brain团队联合出品《TensorFlow in Practice 》

【课程】Andrew Ng与Google Brain团队联合出品《TensorFlow in Practice 》

专知会员服务

13+阅读 · 2019年10月29日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

【伯克利博士论文】通过真实世界实践赋能机器人自主性

军用无人机集群技术尚未成熟——但潜力可期

人工智能安全治理白皮书（2025）

AgentOps综述：分类、挑战与未来方向

相关资讯

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

心之所向的无尽蓝，vivo S12 Pro「屿蓝」

心之所向的无尽蓝，vivo S12 Pro「屿蓝」

ZEALER订阅号

0+阅读 · 2022年1月27日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

20+阅读 · 2017年12月17日

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

机器学习研究会

20+阅读 · 2017年12月17日

相关论文

Deep Bidirectional Language-Knowledge Graph Pretraining

Arxiv

2+阅读 · 2022年10月17日

Imagic: Text-Based Real Image Editing with Diffusion Models

Arxiv

0+阅读 · 2022年10月17日

Vision-Language Pre-training: Basics, Recent Advances, and Future Trends

Arxiv

28+阅读 · 2022年10月17日

Contrastive Language-Image Pre-Training with Knowledge Graphs

Arxiv

0+阅读 · 2022年10月17日

Green Hierarchical Vision Transformer for Masked Image Modeling

Arxiv

0+阅读 · 2022年10月14日

Transformers Meet Visual Learning Understanding: A Comprehensive Review

Arxiv

28+阅读 · 2022年3月24日

Transformers in Time Series: A Survey

Arxiv

34+阅读 · 2022年2月15日

A Survey of Visual Transformers

Arxiv

39+阅读 · 2021年11月11日

From Show to Tell: A Survey on Image Captioning

Arxiv

15+阅读 · 2021年7月14日

XLNet: Generalized Autoregressive Pretraining for Language Understanding

Arxiv

14+阅读 · 2019年6月19日

相关基金

TRAIL诱骗受体DcR2介导糖尿病肾病衰老肾小管上皮细胞凋亡逃逸的作用及机制

国家自然科学基金

0+阅读 · 2014年12月31日

Resveratrol联合MSCs移植对阿尔茨海默鼠的干预效果及Sirt1分子信号的介导作用

国家自然科学基金

0+阅读 · 2014年12月31日

衰老小鼠线粒体促凋亡蛋白Omi/HtrA2表达增加在加重帕金森病中的作用及机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

RegIII信号通路与SOCS3甲基化协同调控胰腺炎症恶性转化的分子机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

HCV下调SIRT1信号通路导致肝细胞糖脂代谢紊乱及其在进展性慢性肝病中的作用

国家自然科学基金

0+阅读 · 2012年12月31日

特征值与图的结构

国家自然科学基金

0+阅读 · 2012年12月31日

高效ⅤB /ⅡB族复合光催化剂分级结构的构筑及光生载流子传输机制

国家自然科学基金

0+阅读 · 2012年12月31日

超临界CO2/有机物分子间相互作用动态诱导的光谱特性及应用

国家自然科学基金

0+阅读 · 2011年12月31日

Notch受体在浸润性膀胱癌中异常活化的分子机制及生物学效应研究

国家自然科学基金

0+阅读 · 2009年12月31日

甘薯AGPase基因TRAP分子标记筛选及高淀粉育种新策略研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员