Language modeling studies the probability distributions over strings of texts. It is one of the most fundamental tasks in natural language processing (NLP). It has been widely used in text generation, speech recognition, machine translation, etc. Conventional language models (CLMs) aim to predict the probability of linguistic sequences in a causal manner. In contrast, pre-trained language models (PLMs) cover broader concepts and can be used in both causal sequential modeling and fine-tuning for downstream applications. PLMs have their own training paradigms (usually self-supervised) and serve as foundation models in modern NLP systems. This overview paper provides an introduction to both CLMs and PLMs from five aspects, i.e., linguistic units, structures, training methods, evaluation methods, and applications. Furthermore, we discuss the relationship between CLMs and PLMs and shed light on the future directions of language modeling in the pre-trained era.
翻译:语言建模研究文本字符串的概率分布,这是自然语言处理(NLP)的最根本任务之一,在文本生成、语音识别、机器翻译等方面广泛使用。常规语言模型(CLMS)旨在以因果方式预测语言序列的概率。相比之下,预先培训的语言模型(PLMs)涵盖更广泛的概念,可用于因果顺序建模和下游应用的微调。PLMs有自己的培训模式(通常是自我监督的),并作为现代NLP系统的基础模型。本概览文件介绍了CLMs和PLMs的五个方面,即语言单元、结构、培训方法、评价方法和应用。此外,我们讨论CLMs和PLMs之间的关系,并介绍培训前时代语言建模的未来方向。</s>