PaLI: 共同制定多语种语文图象模式 (PaLI: A Jointly-Scaled Multilingual Language-Image Model) - 专知论文

会员服务 ·

0

Vision · MoDELS · 语言模型化 · 缩放 · 变换 ·

2022 年 9 月 14 日

PaLI: A Jointly-Scaled Multilingual Language-Image Model

翻译：PaLI: 共同制定多语种语文图象模式

Xi Chen,Xiao Wang,Soravit Changpinyo,AJ Piergiovanni,Piotr Padlewski,Daniel Salz,Sebastian Goodman,Adam Grycner,Basil Mustafa,Lucas Beyer,Alexander Kolesnikov,Joan Puigcerver,Nan Ding,Keran Rong,Hassan Akbari,Gaurav Mishra,Linting Xue,Ashish Thapliyal,James Bradbury,Weicheng Kuo,Mojtaba Seyedhosseini,Chao Jia,Burcu Karagol Ayan,Carlos Riquelme,Andreas Steiner,Anelia Angelova,Xiaohua Zhai,Neil Houlsby,Radu Soricut

Effective scaling and a flexible task interface enable large language models to excel at many tasks.PaLI(PathwaysLanguage andImage model) extends this approach to the joint modeling of language and vision. PaLI generates text based on visual and textual inputs, and with this interface performs many vision, language, and multimodal tasks, in many languages. To train PaLI, we make use of large pretrained encoder-decoder language models and Vision Transformers (ViTs). This allows us to capitalize on their existing capabilities and leverage the substantial cost of training them. We find that joint scaling of the vision and language components is important. Since existing Transformers for language are much larger than their vision counterparts, we train the largest ViT to date (ViT-e) to quantify the benefits from even larger-capacity vision models. To train PaLI, we create a large multilingual mix of pretraining tasks, based on a new image-text training set containing 10B images and texts in over 100 languages. PaLI achieves state-of-the-art in multiple vision and language tasks (such as captioning, visual question-answering, scene-text understanding), while retaining a simple, modular, and scalable design.

翻译：有效缩放和灵活的任务界面使大型语言模型能够出色地完成许多任务。 PALI( PathwaysLanguage and Imaage 模型) 将这一方法推广到语言和视觉的联合模型。 PALI 生成基于视觉和文字投入的文本, 并且与该界面一起以多种语言执行许多视觉、语言和多式联运任务。培训PALI 时, 我们使用大型预先训练的编码解码语言模型和视觉变异器( VITs) 。这使我们能够利用现有的能力, 并利用培训它们的大量费用。我们认为, 共同扩大视觉和语言组成部分很重要。由于现有的语言变换器比其视觉和视觉对应器要大得多, 我们培训迄今为止最大的VIT( VIT-e) 来量化能力更大的视觉模型的好处。为了培训PALI, 我们根据包含100多种语言的10B图像和文本的新培训数据集, 创建了大量的多语言前培训组合。 PALI 在多个视觉和语言任务( 如字幕、视觉解说、视觉解说模块、理解) 中保留一个简单的模块设计、。

0

相关内容

Vision

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【医学图像处理中的因果性】52页ppt，Causality Matters in Medical Imaging

【医学图像处理中的因果性】52页ppt，Causality Matters in Medical Imaging

专知会员服务

60+阅读 · 2020年3月14日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【技术报告】诺亚开源中文预训练语言模型“哪吒”（NEZHA: Neural Contextualized Representation for Chinese Language Understanding）

【技术报告】诺亚开源中文预训练语言模型“哪吒”（NEZHA: Neural Contextualized Representation for Chinese Language Understanding）

专知会员服务

21+阅读 · 2019年12月12日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

肝细胞肝癌中高表达的PRC1基因功能及其受CTCF调控的机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

Notch信号系统对骨髓间充质干细胞向GABA能神经元分化的调节作用及机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

利用脂质组学和SIRT1信号通路分析研究镉致糖尿病肾病发生的作用机制

国家自然科学基金

0+阅读 · 2013年12月31日

Notch信号通路参与家蚕胚胎发育分子机制的研究

国家自然科学基金

0+阅读 · 2013年12月31日

LIMK1：罗格列酮抑制人胃癌细胞增殖、迁移及侵袭的作用靶点

国家自然科学基金

0+阅读 · 2012年12月31日

组蛋白甲基化修饰调控拟南芥冷响应基因TCF1的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

发光二极管LED非相干宽带腔增强吸收光谱技术对大气HONO的定量方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

靶向Notch-1的miRNA在浸润性膀胱癌中的功能及分子机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

miR-124通过靶向Notch信号通路在阿尔茨海默氏病中发挥作用的分子机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

新基因Chin1在神经细胞凋亡中的作用及其分子机制

国家自然科学基金

0+阅读 · 2008年12月31日

Training Language Models with Memory Augmentation

Arxiv

0+阅读 · 2022年10月25日

Lexical Generalization Improves with Larger Models and Longer Training

Arxiv

0+阅读 · 2022年10月25日

FCM: Forgetful Causal Masking Makes Causal Language Models Better Zero-Shot Learners

Arxiv

0+阅读 · 2022年10月24日

Instruction-Following Agents with Jointly Pre-Trained Vision-Language Models

Arxiv

0+阅读 · 2022年10月24日

Few-shot Learning with Multilingual Language Models

Arxiv

0+阅读 · 2022年10月24日

K-AID: Enhancing Pre-trained Language Models with Domain Knowledge for Question Answering

Arxiv

15+阅读 · 2021年9月22日

From Show to Tell: A Survey on Image Captioning

Arxiv

15+阅读 · 2021年7月14日

Making Pre-trained Language Models Better Few-shot Learners

Arxiv

14+阅读 · 2020年12月31日

End-to-End Dense Video Captioning with Masked Transformer

Arxiv

14+阅读 · 2018年4月3日

Image Captioning at Will: A Versatile Scheme for Effectively Injecting Sentiments into Image Descriptions

Arxiv

16+阅读 · 2018年1月30日

VIP会员

文章信息

相关主题

语言模型化

相关VIP内容

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【医学图像处理中的因果性】52页ppt，Causality Matters in Medical Imaging

【医学图像处理中的因果性】52页ppt，Causality Matters in Medical Imaging

专知会员服务

60+阅读 · 2020年3月14日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【技术报告】诺亚开源中文预训练语言模型“哪吒”（NEZHA: Neural Contextualized Representation for Chinese Language Understanding）

【技术报告】诺亚开源中文预训练语言模型“哪吒”（NEZHA: Neural Contextualized Representation for Chinese Language Understanding）

专知会员服务

21+阅读 · 2019年12月12日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

新质生成式AI赋能产业变革的实践与路径

用于多模态大模型的离散标记化：全面综述

Nature综述：金融网络中的物理学

【CMU博士论文】通信高效且差分隐私的优化方法

相关资讯

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

相关论文

Training Language Models with Memory Augmentation

Arxiv

0+阅读 · 2022年10月25日

Lexical Generalization Improves with Larger Models and Longer Training

Arxiv

0+阅读 · 2022年10月25日

FCM: Forgetful Causal Masking Makes Causal Language Models Better Zero-Shot Learners

Arxiv

0+阅读 · 2022年10月24日

Instruction-Following Agents with Jointly Pre-Trained Vision-Language Models

Arxiv

0+阅读 · 2022年10月24日

Few-shot Learning with Multilingual Language Models

Arxiv

0+阅读 · 2022年10月24日

K-AID: Enhancing Pre-trained Language Models with Domain Knowledge for Question Answering

Arxiv

15+阅读 · 2021年9月22日

From Show to Tell: A Survey on Image Captioning

Arxiv

15+阅读 · 2021年7月14日

Making Pre-trained Language Models Better Few-shot Learners

Arxiv

14+阅读 · 2020年12月31日

End-to-End Dense Video Captioning with Masked Transformer

Arxiv

14+阅读 · 2018年4月3日

Image Captioning at Will: A Versatile Scheme for Effectively Injecting Sentiments into Image Descriptions

Arxiv

16+阅读 · 2018年1月30日

相关基金

肝细胞肝癌中高表达的PRC1基因功能及其受CTCF调控的机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

Notch信号系统对骨髓间充质干细胞向GABA能神经元分化的调节作用及机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

利用脂质组学和SIRT1信号通路分析研究镉致糖尿病肾病发生的作用机制

国家自然科学基金

0+阅读 · 2013年12月31日

Notch信号通路参与家蚕胚胎发育分子机制的研究

国家自然科学基金

0+阅读 · 2013年12月31日

LIMK1：罗格列酮抑制人胃癌细胞增殖、迁移及侵袭的作用靶点

国家自然科学基金

0+阅读 · 2012年12月31日

组蛋白甲基化修饰调控拟南芥冷响应基因TCF1的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

发光二极管LED非相干宽带腔增强吸收光谱技术对大气HONO的定量方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

靶向Notch-1的miRNA在浸润性膀胱癌中的功能及分子机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

miR-124通过靶向Notch信号通路在阿尔茨海默氏病中发挥作用的分子机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

新基因Chin1在神经细胞凋亡中的作用及其分子机制

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员