PaLI: 共同制定多语种语文图象模式 (PaLI: A Jointly-Scaled Multilingual Language-Image Model) - 专知论文

会员服务 ·

0

Vision · MoDELS · 语言模型化 · 缩放 · 变换 ·

2022 年 9 月 16 日

PaLI: A Jointly-Scaled Multilingual Language-Image Model

翻译：PaLI: 共同制定多语种语文图象模式

Xi Chen,Xiao Wang,Soravit Changpinyo,AJ Piergiovanni,Piotr Padlewski,Daniel Salz,Sebastian Goodman,Adam Grycner,Basil Mustafa,Lucas Beyer,Alexander Kolesnikov,Joan Puigcerver,Nan Ding,Keran Rong,Hassan Akbari,Gaurav Mishra,Linting Xue,Ashish Thapliyal,James Bradbury,Weicheng Kuo,Mojtaba Seyedhosseini,Chao Jia,Burcu Karagol Ayan,Carlos Riquelme,Andreas Steiner,Anelia Angelova,Xiaohua Zhai,Neil Houlsby,Radu Soricut

Effective scaling and a flexible task interface enable large language models to excel at many tasks. PaLI (Pathways Language and Image model) extends this approach to the joint modeling of language and vision. PaLI generates text based on visual and textual inputs, and with this interface performs many vision, language, and multimodal tasks, in many languages. To train PaLI, we make use of large pretrained encoder-decoder language models and Vision Transformers (ViTs). This allows us to capitalize on their existing capabilities and leverage the substantial cost of training them. We find that joint scaling of the vision and language components is important. Since existing Transformers for language are much larger than their vision counterparts, we train the largest ViT to date (ViT-e) to quantify the benefits from even larger-capacity vision models. To train PaLI, we create a large multilingual mix of pretraining tasks, based on a new image-text training set containing 10B images and texts in over 100 languages. PaLI achieves state-of-the-art in multiple vision and language tasks (such as captioning, visual question-answering, scene-text understanding), while retaining a simple, modular, and scalable design.

翻译：有效缩放和灵活的任务界面使大型语言模型能够完成许多任务。 PaLI(Pathways语言和图像模型)将这一方法推广到语言和视觉的联合模型。 PALI生成基于视觉和文字投入的文本,并用多种语言执行许多视觉、语言和多式联运任务。为培训PLI,我们使用大型预先训练的编码脱coder语言模型和愿景变异器(ViTs),这使我们能够利用现有能力并利用培训它们的巨大成本。我们发现,共同缩放视觉和语言组成部分很重要。由于现有的语言变换器比其视觉对等器要大得多,我们培训迄今为止最大的VIT(ViT-e),以量化甚至更大型的视觉模型的好处。为了培训PALI,我们根据包含100多种语言的10B图像和文本的新图像培训数据集,创建了大量的多语种培训任务组合。PALI在多种视觉和语言任务(如字幕字幕、直观解答、场文解)中达到最新水平(例如可编程、可理解),同时保留一个简单模块、模块、模块、可理解。

1

相关内容

Vision

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

321+阅读 · 2020年11月26日

【医学图像处理中的因果性】52页ppt，Causality Matters in Medical Imaging

【医学图像处理中的因果性】52页ppt，Causality Matters in Medical Imaging

专知会员服务

60+阅读 · 2020年3月14日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【KDD2019教程】从浅层到深层的语言表达:预训练、微调，等等，From Shallow to Deep Language Representations: Pre-training, Fine-tuning, and Beyond

【KDD2019教程】从浅层到深层的语言表达:预训练、微调，等等，From Shallow to Deep Language Representations: Pre-training, Fine-tuning, and Beyond

专知会员服务

16+阅读 · 2019年11月4日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

专知

30+阅读 · 2018年3月22日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

【论文】图上的表示学习综述

【论文】图上的表示学习综述

机器学习研究会

15+阅读 · 2017年9月24日

视觉质量感知的脑电时空特性研究

国家自然科学基金

0+阅读 · 2015年12月31日

microRNA-424及其CpG岛甲基化调节在胶质瘤侵袭机制中的研究

国家自然科学基金

0+阅读 · 2014年12月31日

有限域上多项式的p-进与T-进指数和

国家自然科学基金

0+阅读 · 2013年12月31日

SRSF10调控的选择性剪接在脂肪细胞分化中的调控作用

国家自然科学基金

0+阅读 · 2012年12月31日

句子语境下的语音加工的神经机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

miR-30c/Shh信号通路在PCBs暴露致子代心脏发育缺陷中的机制

国家自然科学基金

0+阅读 · 2012年12月31日

实时安全关键系统的建模、仿真与验证

国家自然科学基金

1+阅读 · 2012年12月31日

水电站水力发电系统模型及控制

国家自然科学基金

0+阅读 · 2011年12月31日

欠驱动航天器的姿态动力学与控制问题研究

国家自然科学基金

0+阅读 · 2009年12月31日

DNA损伤诱导的p53非依赖性细胞凋亡途径- - -Bim途径

国家自然科学基金

0+阅读 · 2009年12月31日

Don't Prompt, Search! Mining-based Zero-Shot Learning with Language Models

Arxiv

0+阅读 · 2022年10月26日

Confident Adaptive Language Modeling

Confident Adaptive Language Modeling

Arxiv

0+阅读 · 2022年10月25日

SpeechCLIP: Integrating Speech with Pre-Trained Vision and Language Model

Arxiv

0+阅读 · 2022年10月25日

Hyper-X: A Unified Hypernetwork for Multi-Task Multilingual Transfer

Arxiv

0+阅读 · 2022年10月25日

Evaluating the Impact of Model Scale for Compositional Generalization in Semantic Parsing

Arxiv

0+阅读 · 2022年10月24日

Instruction-Following Agents with Jointly Pre-Trained Vision-Language Models

Arxiv

0+阅读 · 2022年10月24日

Multilingual Multimodal Learning with Machine Translated Text

Arxiv

1+阅读 · 2022年10月24日

Few-shot Learning with Multilingual Language Models

Arxiv

0+阅读 · 2022年10月24日

CPL: Counterfactual Prompt Learning for Vision and Language Models

CPL: Counterfactual Prompt Learning for Vision and Language Models

Arxiv

0+阅读 · 2022年10月22日

Improving the Sample Efficiency of Prompt Tuning with Domain Adaptation

Arxiv

0+阅读 · 2022年10月21日

VIP会员

文章信息

相关主题

语言模型化

相关VIP内容

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

321+阅读 · 2020年11月26日

【医学图像处理中的因果性】52页ppt，Causality Matters in Medical Imaging

【医学图像处理中的因果性】52页ppt，Causality Matters in Medical Imaging

专知会员服务

60+阅读 · 2020年3月14日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【KDD2019教程】从浅层到深层的语言表达:预训练、微调，等等，From Shallow to Deep Language Representations: Pre-training, Fine-tuning, and Beyond

【KDD2019教程】从浅层到深层的语言表达:预训练、微调，等等，From Shallow to Deep Language Representations: Pre-training, Fine-tuning, and Beyond

专知会员服务

16+阅读 · 2019年11月4日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

GPT-5如何对齐？从硬性拒绝到安全完成：走向以输出为中心的安全训练

【伯克利博士论文】超越人类监督的视觉智能

【ICCV2025】SO(3) 上连续非保守动力系统的预测

2025年中国数据要素行业发展研究报告

相关资讯

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

专知

30+阅读 · 2018年3月22日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

【论文】图上的表示学习综述

【论文】图上的表示学习综述

机器学习研究会

15+阅读 · 2017年9月24日

相关论文

Don't Prompt, Search! Mining-based Zero-Shot Learning with Language Models

Arxiv

0+阅读 · 2022年10月26日

Confident Adaptive Language Modeling

Confident Adaptive Language Modeling

Arxiv

0+阅读 · 2022年10月25日

SpeechCLIP: Integrating Speech with Pre-Trained Vision and Language Model

Arxiv

0+阅读 · 2022年10月25日

Hyper-X: A Unified Hypernetwork for Multi-Task Multilingual Transfer

Arxiv

0+阅读 · 2022年10月25日

Evaluating the Impact of Model Scale for Compositional Generalization in Semantic Parsing

Arxiv

0+阅读 · 2022年10月24日

Instruction-Following Agents with Jointly Pre-Trained Vision-Language Models

Arxiv

0+阅读 · 2022年10月24日

Multilingual Multimodal Learning with Machine Translated Text

Arxiv

1+阅读 · 2022年10月24日

Few-shot Learning with Multilingual Language Models

Arxiv

0+阅读 · 2022年10月24日

CPL: Counterfactual Prompt Learning for Vision and Language Models

CPL: Counterfactual Prompt Learning for Vision and Language Models

Arxiv

0+阅读 · 2022年10月22日

Improving the Sample Efficiency of Prompt Tuning with Domain Adaptation

Arxiv

0+阅读 · 2022年10月21日

相关基金

视觉质量感知的脑电时空特性研究

国家自然科学基金

0+阅读 · 2015年12月31日

microRNA-424及其CpG岛甲基化调节在胶质瘤侵袭机制中的研究

国家自然科学基金

0+阅读 · 2014年12月31日

有限域上多项式的p-进与T-进指数和

国家自然科学基金

0+阅读 · 2013年12月31日

SRSF10调控的选择性剪接在脂肪细胞分化中的调控作用

国家自然科学基金

0+阅读 · 2012年12月31日

句子语境下的语音加工的神经机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

miR-30c/Shh信号通路在PCBs暴露致子代心脏发育缺陷中的机制

国家自然科学基金

0+阅读 · 2012年12月31日

实时安全关键系统的建模、仿真与验证

国家自然科学基金

1+阅读 · 2012年12月31日

水电站水力发电系统模型及控制

国家自然科学基金

0+阅读 · 2011年12月31日

欠驱动航天器的姿态动力学与控制问题研究

国家自然科学基金

0+阅读 · 2009年12月31日

DNA损伤诱导的p53非依赖性细胞凋亡途径- - -Bim途径

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员