建立语言、愿景和愿景-语言理解任务的一般基础模式 (Toward Building General Foundation Models for Language, Vision, and Vision-Language Understanding Tasks)

Foundation models or pre-trained models have substantially improved the performance of various language, vision, and vision-language understanding tasks. However, existing foundation models can only perform the best in one type of tasks, namely language, vision, or vision-language. It is still an open question whether it is possible to construct a foundation model performing the best for all the understanding tasks, which we call a general foundation model. In this paper, we propose a new general foundation model, X-FM (the X-Foundation Model). X-FM has one language encoder, one vision encoder, and one fusion encoder, as well as a new training method. The training method includes two new techniques for learning X-FM from text, image, and image-text pair data. One is to stop gradients from the vision-language training when learning the language encoder. The other is to leverage the vision-language training to guide the learning of the vision encoder. Extensive experiments on benchmark datasets show that X-FM can significantly outperform existing general foundation models and perform better than or comparable to existing foundation models specifically for language, vision, or vision-language understanding.

翻译：基础模型或经过事先培训的模型大大改善了各种语言、愿景和愿景理解任务的绩效,然而,现有的基础模型只能在一种类型的任务(即语言、愿景或愿景语言)中最出色地发挥最佳作用,仍然是一个未决问题,能否建立一个最有利于所有理解任务的基础模型,我们称之为一般基础模型。在本文中,我们提议一个新的通用基础模型X-FM(X-Foundation模型)。X-FM有一个语言编码器、一个愿景编码器和一个聚合编码器,以及一个新的培训方法。培训方法包括两种新技术,从文字、图像和图像-文本对配对数据中学习X-FMMy。一个是在学习语言编码器时阻止从愿景-语言培训中的梯度。另一个是利用愿景-语言培训来指导愿景编码器的学习。关于基准数据集的广泛实验表明,X-FM能够大大超越现有的一般基础模型,并比现有的语言、愿景或愿景-理解的基础模型更好或更可比。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

104+阅读 · 2022年2月10日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日