Continual pretraining is a standard way of building a domain-specific pretrained language model from a general-domain language model. However, sequential task training may cause catastrophic forgetting, which affects the model performance in downstream tasks. In this paper, we propose a continual pretraining method for the BERT-based model, named CBEAF-Adapting (Chinese Biomedical Enhanced Attention-FFN Adapting). Its main idea is to introduce a small number of attention heads and hidden units inside each self-attention layer and feed-forward network. Using the Chinese biomedical domain as a running example, we trained a domain-specific language model named CBEAF-RoBERTa. We conduct experiments by applying models to downstream tasks. The results demonstrate that with only about 3% of model parameters trained, our method could achieve about 0.5%, 2% average performance gain compared to the best performing model in baseline and the domain-specific model, PCL-MedBERT, respectively. We also examine the forgetting problem of different pretraining methods. Our method alleviates the problem by about 13% compared to fine-tuning.
翻译:持续培训前培训是从通用语言模式中建立特定领域预先培训语言模式的标准方式。 但是,连续任务培训可能导致灾难性的遗忘,影响下游任务模式的运行。 在本文中,我们建议为基于BERT的模型,名为CBEAF-Adapting(中国生物医学增强注意-FFN适应)的连续培训前培训方法(CBEAF-Adapting)(中国生物医学增强注意-FFN适应)提出持续培训前培训方法。它的主要想法是在每个自留层和进化前网络中引入少量关注负责人和隐藏单位。我们以中国生物医学领域为例,培训了一个名为CBEAF-ROBERTA的域语言模式。我们通过将模型应用到下游任务中来进行实验。结果显示,与基准和特定领域模型PCL-MEDBERT的最佳运行模式相比,我们的方法可以分别实现大约0.5%和2%的平均绩效收益。我们还研究了不同培训前方法的遗忘问题。我们的方法比微调减少了大约13%的问题。