The Pretrained Foundation Models (PFMs) are regarded as the foundation for various downstream tasks with different data modalities. A pretrained foundation model, such as BERT, GPT-3, MAE, DALLE-E, and ChatGPT, is trained on large-scale data which provides a reasonable parameter initialization for a wide range of downstream applications. The idea of pretraining behind PFMs plays an important role in the application of large models. Different from previous methods that apply convolution and recurrent modules for feature extractions, the generative pre-training (GPT) method applies Transformer as the feature extractor and is trained on large datasets with an autoregressive paradigm. Similarly, the BERT apples transformers to train on large datasets as a contextual language model. Recently, the ChatGPT shows promising success on large language models, which applies an autoregressive language model with zero shot or few show prompting. With the extraordinary success of PFMs, AI has made waves in a variety of fields over the past few years. Considerable methods, datasets, and evaluation metrics have been proposed in the literature, the need is raising for an updated survey. This study provides a comprehensive review of recent research advancements, current and future challenges, and opportunities for PFMs in text, image, graph, as well as other data modalities. We first review the basic components and existing pretraining in natural language processing, computer vision, and graph learning. We then discuss other advanced PFMs for other data modalities and unified PFMs considering the data quality and quantity. Besides, we discuss relevant research about the fundamentals of the PFM, including model efficiency and compression, security, and privacy. Finally, we lay out key implications, future research directions, challenges, and open problems.
翻译:预先培训的基础模型,如BERT、GPT-3、MAE、DALLE-E和ChatGPT, 接受大规模数据培训,为一系列广泛的下游应用提供合理的参数初始化。PFM背后的预培训理念在应用大型模型方面起着重要作用。与以往应用特征提取变动和经常性模块的方法不同,基因化的公开培训前(GPT)方法将变形器用作特征提取器,并接受具有自动递增模式的大型数据集培训。同样,BERT苹果变异器培训大型数据集,作为背景语言模型。最近,ChartGPT在大型语言模型上展示了很有希望的成功,在应用零镜头或鲜露出信号的自动递增语言模型。由于PFMT的非凡成功,AI在过去几年里在多个领域产生了波浪浪流。许多方法、数据集和高级评价指标集,在文献中以自动递减模式进行大规模数据集化研究, 最终,我们讨论了最新版本的图像研究,我们讨论了其他图表, 也讨论了最新版本。