Through their transfer learning abilities, highly-parameterized large pre-trained language models have dominated the NLP landscape for a multitude of downstream language tasks. Though linguistically proficient, the inability of these models to incorporate the learning of non-linguistic entities (numerals and arithmetic reasoning) limits their usage for tasks that require numeric comprehension or strict mathematical reasoning. However, as we illustrate in this paper, building a general purpose language model that also happens to be proficient in mathematical reasoning is not as straight-forward as training it on a numeric dataset. In this work, we develop a novel framework that enables language models to be mathematically proficient while retaining their linguistic prowess. Specifically, we offer information-theoretic interventions to overcome the catastrophic forgetting of linguistic skills that occurs while injecting non-linguistic skills into language models.
翻译:通过传授学习能力,高度分化的、经过培训的大型语言模型在国家语言平台中占据了多种下游语言任务的主导位置。虽然语言熟练,但这些模型无法纳入非语言实体的学习(数字和算术推理)限制了这些模型用于需要数字理解或严格数学推理的任务。然而,正如我们在本文件中所说明的那样,建立一个同样精通数学推理的通用语言模型,不如对数字数据集进行培训那么直截了当地向前发展。在这项工作中,我们开发了一个新颖的框架,使语言模型能够数学熟练,同时保留其语言素材。具体地说,我们提供了信息理论干预,以克服在将非语言技能注入语言模型时发生的对语言技能的灾难性遗忘。