CBEAF-ADDAD:加强建设中国生物医学语言模式的持续预备培训 (CBEAF-Adapting: Enhanced Continual Pretraining for Building Chinese Biomedical Language Model)

Continual pretraining is a standard way of building a domain-specific pretrained language model from a general-domain language model. However, sequential task training may cause catastrophic forgetting, which affects the model performance in downstream tasks. In this paper, we propose a continual pretraining method for the BERT-based model, named CBEAF-Adapting (Chinese Biomedical Enhanced Attention-FFN Adapting). Its main idea is to introduce a small number of attention heads and hidden units inside each self-attention layer and feed-forward network. Using the Chinese biomedical domain as a running example, we trained a domain-specific language model named CBEAF-RoBERTa. We conduct experiments by applying models to downstream tasks. The results demonstrate that with only about 3% of model parameters trained, our method could achieve about 0.5%, 2% average performance gain compared to the best performing model in baseline and the domain-specific model, PCL-MedBERT, respectively. We also examine the forgetting problem of different pretraining methods. Our method alleviates the problem by about 13% compared to fine-tuning.

翻译：持续培训前培训是从通用语言模式中建立特定领域预先培训语言模式的标准方式。但是,连续任务培训可能导致灾难性的遗忘,影响下游任务模式的运行。在本文中,我们建议为基于BERT的模型,名为CBEAF-Adapting(中国生物医学增强注意-FFN适应)的连续培训前培训方法(CBEAF-Adapting)(中国生物医学增强注意-FFN适应)提出持续培训前培训方法。它的主要想法是在每个自留层和进化前网络中引入少量关注负责人和隐藏单位。我们以中国生物医学领域为例,培训了一个名为CBEAF-ROBERTA的域语言模式。我们通过将模型应用到下游任务中来进行实验。结果显示,与基准和特定领域模型PCL-MEDBERT的最佳运行模式相比,我们的方法可以分别实现大约0.5%和2%的平均绩效收益。我们还研究了不同培训前方法的遗忘问题。我们的方法比微调减少了大约13%的问题。

相关内容

Continuity

关注 4

让 iOS 8 和 OS X Yosemite 无缝切换的一个新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source: Apple - iOS 8

【NLP模型压缩方法综述】《A Survey of Methods for Model Compression in NLP》by Madison May

专知会员服务

43+阅读 · 2020年4月22日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日