Pre-trained language models (PLMs) have achieved remarkable success on various natural language understanding tasks. Simple fine-tuning of PLMs, on the other hand, might be suboptimal for domain-specific tasks because they cannot possibly cover knowledge from all domains. While adaptive pre-training of PLMs can help them obtain domain-specific knowledge, it requires a large training cost. Moreover, adaptive pre-training can harm the PLM's performance on the downstream task by causing catastrophic forgetting of its general knowledge. To overcome such limitations of adaptive pre-training for PLM adaption, we propose a novel domain adaption framework for PLMs coined as Knowledge-Augmented Language model Adaptation (KALA), which modulates the intermediate hidden representations of PLMs with domain knowledge, consisting of entities and their relational facts. We validate the performance of our KALA on question answering and named entity recognition tasks on multiple datasets across various domains. The results show that, despite being computationally efficient, our KALA largely outperforms adaptive pre-training. Code is available at: https://github.com/Nardien/KALA/.
翻译:预先培训的语言模式(PLM)在各种自然语言理解任务方面取得了显著的成功。另一方面,对PLM的简单微调可能由于无法涵盖所有领域的知识而对于具体领域的特定任务来说并不理想。虽然对PLM的适应性培训前培训可以帮助他们获得具体领域的知识,但需要大量培训费用。此外,适应性培训前培训会损害PLM在下游任务上的表现,造成对其一般知识的灾难性的遗忘。为了克服PLM适应性培训前培训的局限性,我们提议了一个新的PLM的适应性框架,即“知识强化语言模型适应”(KALA),这个框架将PLM的中间隐蔽形式与域知识(由实体及其关联事实组成)进行调节。我们验证了我们KALA在不同领域对多个数据集的问答和命名实体确认任务的业绩。结果显示,尽管计算效率很高,但我们的KALA基本上超越了适应性培训前培训。代码见:https://github.com/Nardien/KALA/。