There have been various types of pretraining architectures including autoregressive models (e.g., GPT), autoencoding models (e.g., BERT), and encoder-decoder models (e.g., T5). On the other hand, NLP tasks are different in nature, with three main categories being classification, unconditional generation, and conditional generation. However, none of the pretraining frameworks performs the best for all tasks, which introduces inconvenience for model development and selection. We propose a novel pretraining framework GLM (General Language Model) to address this challenge. Compared to previous work, our architecture has three major benefits: (1) it performs well on classification, unconditional generation, and conditional generation tasks with one single pretrained model; (2) it outperforms BERT-like models on classification due to improved pretrain-finetune consistency; (3) it naturally handles variable-length blank filling which is crucial for many downstream tasks. Empirically, GLM substantially outperforms BERT on the SuperGLUE natural language understanding benchmark with the same amount of pre-training data. Moreover, GLM with 1.25x parameters of BERT-Large achieves the best performance in NLU, conditional and unconditional generation at the same time, which demonstrates its generalizability to different downstream tasks.
翻译:培训前结构有各种类型,包括自动递增模型(例如,GPT)、自动编码模型(例如,BERT)和编码代码模型(例如,T5)。 另一方面,NLP的任务性质不同,有三大类是分类、无条件生成和有条件生成;然而,没有哪一个培训前框架对所有任务都最适合,给模式的开发和选择带来不便。我们提议了一个新的培训前框架GLM(通用语言模型)来应对这一挑战。与以前的工作相比,我们的建筑有三大好处:(1) 它在分类、无条件生成和有条件生成任务方面表现良好,只有一个经过预先培训的模式;(2) 它在分类方面优于BERT类似模式,因为改进了前脑膜的兼容性;(3) 它自然地处理对许多下游任务至关重要的多长空填料。