ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities - 专知论文

会员服务 ·

0

模态 · Extensibility · MoDELS · contrastive · 表示 ·

2023 年 5 月 18 日

ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities

翻译：暂无翻译

Peng Wang,Shijie Wang,Junyang Lin,Shuai Bai,Xiaohuan Zhou,Jingren Zhou,Xinggang Wang,Chang Zhou

from arxiv, 30 pages, 9 figures, 18 tables

In this work, we explore a scalable way for building a general representation model toward unlimited modalities. We release ONE-PEACE, a highly extensible model with 4B parameters that can seamlessly align and integrate representations across vision, audio, and language modalities. The architecture of ONE-PEACE comprises modality adapters, shared self-attention layers, and modality FFNs. This design allows for the easy extension of new modalities by adding adapters and FFNs, while also enabling multi-modal fusion through self-attention layers. To pretrain ONE-PEACE, we develop two modality-agnostic pretraining tasks, cross-modal aligning contrast and intra-modal denoising contrast, which align the semantic space of different modalities and capture fine-grained details within modalities concurrently. With the scaling-friendly architecture and pretraining tasks, ONE-PEACE has the potential to expand to unlimited modalities. Without using any vision or language pretrained model for initialization, ONE-PEACE achieves leading results on a wide range of uni-modal and multi-modal tasks, including image classification (ImageNet), semantic segmentation (ADE20K), audio-text retrieval (AudioCaps, Clotho), audio classification (ESC-50, FSD50K, VGGSound), audio question answering (AVQA), image-text retrieval (MSCOCO, Flickr30K), and visual grounding (RefCOCO/+/g). Code is available at https://github.com/OFA-Sys/ONE-PEACE.

翻译：暂无翻译

1

相关内容

百篇论文纵览大型语言模型最新研究进展

百篇论文纵览大型语言模型最新研究进展

专知会员服务

70+阅读 · 2023年3月31日

Artificial Intelligence: Ready to Ride the Wave? BCG 28页PPT

Artificial Intelligence: Ready to Ride the Wave? BCG 28页PPT

专知会员服务

28+阅读 · 2022年2月20日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

专知会员服务

43+阅读 · 2020年1月28日

【Google论文】ALBERT:自我监督学习语言表达的精简BERT

【Google论文】ALBERT:自我监督学习语言表达的精简BERT

专知会员服务

24+阅读 · 2019年11月4日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

专知

13+阅读 · 2018年6月24日

【论文推荐】最新5篇行人再识别（ReID）相关论文—迁移学习、特征集成、重排序、多通道金字塔、深层生成模型

【论文推荐】最新5篇行人再识别（ReID）相关论文—迁移学习、特征集成、重排序、多通道金字塔、深层生成模型

专知

12+阅读 · 2018年3月24日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

20+阅读 · 2017年12月17日

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

机器学习研究会

20+阅读 · 2017年12月17日

一类离散Hindmarsh-Rose模型的分支延拓

国家自然科学基金

0+阅读 · 2015年12月31日

电子束泵浦AlGaN深紫外激光器的研究

国家自然科学基金

0+阅读 · 2015年12月31日

胶质瘤侵袭过程中DNMT1沉默miR-134与ERK信号通路自激活的表观新机制

国家自然科学基金

0+阅读 · 2015年12月31日

煤粉非均相MILD燃烧及燃料型NOx生成机理研究

国家自然科学基金

0+阅读 · 2015年12月31日

纳米Ni/(La,Sr)TiO3+δ-YSZ复合阴极体系的制备与电解CO2研究

国家自然科学基金

0+阅读 · 2013年12月31日

Internet网中病毒的检测-扩散耦合动力学模型及最优控制策略的研究

国家自然科学基金

0+阅读 · 2013年12月31日

具有一般输运系数的可压缩Navier-Stokes型的方程组解的性态分析

国家自然科学基金

0+阅读 · 2013年12月31日

ARK5/p38MAPK/Pim-3信号通路在胃癌发生、发展中的作用及机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于转录组测序筛选马铃薯块茎休眠与发芽相关基因及功能鉴定

国家自然科学基金

0+阅读 · 2011年12月31日

基于Surfacelet多尺度积的三维SAR图像去噪与分割

国家自然科学基金

0+阅读 · 2009年12月31日

Prompting Diffusion Representations for Cross-Domain Semantic Segmentation

Arxiv

0+阅读 · 2023年7月5日

Knowledge-Aware Audio-Grounded Generative Slot Filling for Limited Annotated Data

Arxiv

0+阅读 · 2023年7月4日

Exploring the In-context Learning Ability of Large Language Model for Biomedical Concept Linking

Arxiv

0+阅读 · 2023年7月3日

Understand Legal Documents with Contextualized Large Language Models

Arxiv

0+阅读 · 2023年7月2日

Contrasting Intra-Modal and Ranking Cross-Modal Hard Negatives to Enhance Visio-Linguistic Fine-grained Understanding

Arxiv

0+阅读 · 2023年7月2日

The Effect of Balancing Methods on Model Behavior in Imbalanced Classification Problems

Arxiv

0+阅读 · 2023年6月30日

Learning Imbalanced Data with Vision Transformers

Arxiv

11+阅读 · 2023年3月8日

K-AID: Enhancing Pre-trained Language Models with Domain Knowledge for Question Answering

Arxiv

15+阅读 · 2021年9月22日

Unifying Vision-and-Language Tasks via Text Generation

Arxiv

10+阅读 · 2021年2月4日

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Arxiv

15+阅读 · 2018年10月11日

VIP会员

文章信息

相关主题

相关VIP内容

百篇论文纵览大型语言模型最新研究进展

百篇论文纵览大型语言模型最新研究进展

专知会员服务

70+阅读 · 2023年3月31日

Artificial Intelligence: Ready to Ride the Wave? BCG 28页PPT

Artificial Intelligence: Ready to Ride the Wave? BCG 28页PPT

专知会员服务

28+阅读 · 2022年2月20日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

专知会员服务

43+阅读 · 2020年1月28日

【Google论文】ALBERT:自我监督学习语言表达的精简BERT

【Google论文】ALBERT:自我监督学习语言表达的精简BERT

专知会员服务

24+阅读 · 2019年11月4日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【CMU博士论文】基础模型训练中网络规模数据的负责任与高效使用

《俄乌战争背景下俄罗斯的战略性海军分析（2022-2025年）》最新100页报告

人工智能时代背景下的未来海战

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

专知

13+阅读 · 2018年6月24日

【论文推荐】最新5篇行人再识别（ReID）相关论文—迁移学习、特征集成、重排序、多通道金字塔、深层生成模型

【论文推荐】最新5篇行人再识别（ReID）相关论文—迁移学习、特征集成、重排序、多通道金字塔、深层生成模型

专知

12+阅读 · 2018年3月24日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

20+阅读 · 2017年12月17日

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

机器学习研究会

20+阅读 · 2017年12月17日

相关论文

Prompting Diffusion Representations for Cross-Domain Semantic Segmentation

Arxiv

0+阅读 · 2023年7月5日

Knowledge-Aware Audio-Grounded Generative Slot Filling for Limited Annotated Data

Arxiv

0+阅读 · 2023年7月4日

Exploring the In-context Learning Ability of Large Language Model for Biomedical Concept Linking

Arxiv

0+阅读 · 2023年7月3日

Understand Legal Documents with Contextualized Large Language Models

Arxiv

0+阅读 · 2023年7月2日

Contrasting Intra-Modal and Ranking Cross-Modal Hard Negatives to Enhance Visio-Linguistic Fine-grained Understanding

Arxiv

0+阅读 · 2023年7月2日

The Effect of Balancing Methods on Model Behavior in Imbalanced Classification Problems

Arxiv

0+阅读 · 2023年6月30日

Learning Imbalanced Data with Vision Transformers

Arxiv

11+阅读 · 2023年3月8日

K-AID: Enhancing Pre-trained Language Models with Domain Knowledge for Question Answering

Arxiv

15+阅读 · 2021年9月22日

Unifying Vision-and-Language Tasks via Text Generation

Arxiv

10+阅读 · 2021年2月4日

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Arxiv

15+阅读 · 2018年10月11日

相关基金

一类离散Hindmarsh-Rose模型的分支延拓

国家自然科学基金

0+阅读 · 2015年12月31日

电子束泵浦AlGaN深紫外激光器的研究

国家自然科学基金

0+阅读 · 2015年12月31日

胶质瘤侵袭过程中DNMT1沉默miR-134与ERK信号通路自激活的表观新机制

国家自然科学基金

0+阅读 · 2015年12月31日

煤粉非均相MILD燃烧及燃料型NOx生成机理研究

国家自然科学基金

0+阅读 · 2015年12月31日

纳米Ni/(La,Sr)TiO3+δ-YSZ复合阴极体系的制备与电解CO2研究

国家自然科学基金

0+阅读 · 2013年12月31日

Internet网中病毒的检测-扩散耦合动力学模型及最优控制策略的研究

国家自然科学基金

0+阅读 · 2013年12月31日

具有一般输运系数的可压缩Navier-Stokes型的方程组解的性态分析

国家自然科学基金

0+阅读 · 2013年12月31日

ARK5/p38MAPK/Pim-3信号通路在胃癌发生、发展中的作用及机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于转录组测序筛选马铃薯块茎休眠与发芽相关基因及功能鉴定

国家自然科学基金

0+阅读 · 2011年12月31日

基于Surfacelet多尺度积的三维SAR图像去噪与分割

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员