Domain-adaptive pre-training (or DA-training for short), also known as post-training, aims to train a pre-trained general-purpose language model (LM) using an unlabeled corpus of a particular domain to adapt the LM so that end-tasks in the domain can give improved performances. However, existing DA-training methods are in some sense blind as they do not explicitly identify what knowledge in the LM should be preserved and what should be changed by the domain corpus. This paper shows that the existing methods are suboptimal and proposes a novel method to perform a more informed adaptation of the knowledge in the LM by (1) soft-masking the attention heads based on their importance to best preserve the general knowledge in the LM and (2) contrasting the representations of the general and the full (both general and domain knowledge) to learn an integrated representation with both general and domain-specific knowledge. Experimental results will demonstrate the effectiveness of the proposed approach.
翻译:培训前适应性培训(或短期DA培训),也称为培训后培训,目的是用一个特定领域的未贴标签材料,培训一个经过预先训练的通用语言模式(LM),以调整LM,使域内的最终任务能够提高性能,然而,现有的DA培训方法在某种意义上是盲目的,因为它们没有明确确定LM中的知识应当保留哪些知识以及域名中应当改变哪些知识。本文表明,现有方法不最理想,并提出了一种创新方法,以对LM中的知识进行更知情的调整,其方法是(1) 软化地将注意力放在头上,因为它们对最好地保持LM中的一般知识的重要性,(2) 将一般和全部(一般和领域知识)的表述方式加以对比,以学习具有一般和具体领域知识的综合代表。实验结果将显示拟议方法的有效性。