终身预培训:不断调整语言模式以适应新兴企业 (Lifelong Pretraining: Continually Adapting Language Models to Emerging Corpora)

Pretrained language models (PTLMs) are typically learned over a large, static corpus and further fine-tuned for various downstream tasks. However, when deployed in the real world, a PTLM-based model must deal with data distributions that deviate from what the PTLM was initially trained on. In this paper, we study a lifelong language model pretraining challenge where a PTLM is continually updated so as to adapt to emerging data. Over a domain-incremental research paper stream and a chronologically-ordered tweet stream, we incrementally pretrain a PTLM with different continual learning algorithms, and keep track of the downstream task performance (after fine-tuning). We evaluate PTLM's ability to adapt to new corpora while retaining learned knowledge in earlier corpora. Our experiments show distillation-based approaches to be most effective in retaining downstream performance in earlier domains. The algorithms also improve knowledge transfer, allowing models to achieve better downstream performance over the latest data, and improve temporal generalization when distribution gaps exist between training and evaluation because of time. We believe our problem formulation, methods, and analysis will inspire future studies towards continual pretraining of language models.

翻译：预先培训的语言模型(PTLM)通常是在大型、静态的文体中学习的,并针对各种下游任务进行进一步的微调,然而,如果在现实世界中部署,基于PTLM的模式必须处理不同于PTLM最初培训的数据分布。在本文中,我们研究终身语言模型预培训挑战,即PTLM不断更新,以适应新出现的数据。在领域研究论文流和按时间顺序顺序排列的推文流中,我们用不同的持续学习算法对PTLM进行逐步培训,并跟踪下游任务绩效(经过微调后)。我们评估PTLM适应新的团体的能力,同时保留早期公司体学得的知识。我们的实验显示,基于蒸馏的方法对于保留早期领域的下游业绩最为有效。算法还改进了知识转让,使模型能够比最新数据取得更好的下游业绩,并在培训和评价之间因时间而存在分配差距时改进时间化。我们相信,我们的问题、方法和分析将激励未来对语言模型进行持续培训前的研究。

相关内容

Continuity

关注 4

让 iOS 8 和 OS X Yosemite 无缝切换的一个新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source: Apple - iOS 8

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

【ACL2020】不要停止预训练:根据领域和任务自适应调整语言模型，Don't Stop Pretraining: Adapt Language Models to Domains and Tasks

专知会员服务

46+阅读 · 2020年4月25日

【医学图像处理中的因果性】52页ppt，Causality Matters in Medical Imaging

专知会员服务

60+阅读 · 2020年3月14日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日