关于 " 预先培训的基金会模型:从BERT到ChatGPT的历史 " 的全面调查 (A Comprehensive Survey on Pretrained Foundation Models: A History from BERT to ChatGPT)

Ce Zhou,Qian Li,Chen Li,Jun Yu,Yixin Liu,Guangjing Wang,Kai Zhang,Cheng Ji,Qiben Yan,Lifang He,Hao Peng,Jianxin Li,Jia Wu,Ziwei Liu,Pengtao Xie,Caiming Xiong,Jian Pei,Philip S. Yu,Lichao Sun

from arxiv, 97 pages, 16 figures

The Pretrained Foundation Models (PFMs) are regarded as the foundation for various downstream tasks with different data modalities. A pretrained foundation model, such as BERT, GPT-3, MAE, DALLE-E, and ChatGPT, is trained on large-scale data which provides a reasonable parameter initialization for a wide range of downstream applications. The idea of pretraining behind PFMs plays an important role in the application of large models. Different from previous methods that apply convolution and recurrent modules for feature extractions, the generative pre-training (GPT) method applies Transformer as the feature extractor and is trained on large datasets with an autoregressive paradigm. Similarly, the BERT apples transformers to train on large datasets as a contextual language model. Recently, the ChatGPT shows promising success on large language models, which applies an autoregressive language model with zero shot or few show prompting. With the extraordinary success of PFMs, AI has made waves in a variety of fields over the past few years. Considerable methods, datasets, and evaluation metrics have been proposed in the literature, the need is raising for an updated survey. This study provides a comprehensive review of recent research advancements, current and future challenges, and opportunities for PFMs in text, image, graph, as well as other data modalities. We first review the basic components and existing pretraining in natural language processing, computer vision, and graph learning. We then discuss other advanced PFMs for other data modalities and unified PFMs considering the data quality and quantity. Besides, we discuss relevant research about the fundamentals of the PFM, including model efficiency and compression, security, and privacy. Finally, we lay out key implications, future research directions, challenges, and open problems.

翻译：预先培训的基础模型,如BERT、GPT-3、MAE、DALLE-E和ChatGPT, 接受大规模数据培训,为一系列广泛的下游应用提供合理的参数初始化。PFM背后的预培训理念在应用大型模型方面起着重要作用。与以往应用特征提取变动和经常性模块的方法不同,基因化的公开培训前(GPT)方法将变形器用作特征提取器,并接受具有自动递增模式的大型数据集培训。同样,BERT苹果变异器培训大型数据集,作为背景语言模型。最近,ChartGPT在大型语言模型上展示了很有希望的成功,在应用零镜头或鲜露出信号的自动递增语言模型。由于PFMT的非凡成功,AI在过去几年里在多个领域产生了波浪浪流。许多方法、数据集和高级评价指标集,在文献中以自动递减模式进行大规模数据集化研究, 最终,我们讨论了最新版本的图像研究,我们讨论了其他图表, 也讨论了最新版本。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

百篇论文纵览大型语言模型最新研究进展

专知会员服务

70+阅读 · 2023年3月31日

【如何做研究】How to research ，22页ppt

专知会员服务

112+阅读 · 2021年4月17日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日