Language models (LMs) have been instrumental for the rapid advance of natural language processing. This paper studies continual learning of LMs, in particular, continual domain-adaptive pre-training (or continual DAP-training). Existing research has shown that further pre-training an LM using a domain corpus to adapt the LM to the domain can improve the end-task performance in the domain. This paper proposes a novel method to continually DAP-train an LM with a sequence of unlabeled domain corpora to adapt the LM to these domains to improve their end-task performances. The key novelty of our method is a soft-masking mechanism that directly controls the update to the LM. A novel proxy is also proposed to preserve the general knowledge in the original LM. Additionally, it contrasts the representations of the previously learned domain knowledge (including the general knowledge in the pre-trained LM) and the knowledge from the current full network to achieve knowledge integration. The method not only overcomes catastrophic forgetting, but also achieves knowledge transfer to improve end-task performances. Empirical evaluation demonstrates the effectiveness of the proposed method.
翻译:语言模型(LMS)有助于自然语言处理的快速推进。本文研究持续学习LMS,特别是连续的域适应性预备培训(或连续的DAP培训),现有研究表明,进一步培训使用域文使LM适应域的LM进一步培训LM可以改进域内的终极任务性能。本文提出了一种新颖的方法,使DAP不断将无标签域域名的Corbora 域名引入LM,使其适应于这些域名,以改进其最终任务性能。我们方法的关键新颖性是直接控制LM的更新的软造型机制。还提议了一种新的代用来保存原始LM的普通知识。此外,它与以前所学域知识的表述(包括预先训练的LM的一般知识)和目前整个网络的知识实现知识融合的对比。这种方法不仅克服了灾难性的遗忘,而且还实现了知识转让,以改进最终任务性能。Empricalal评价表明拟议方法的有效性。