Current pre-trained language models (PLM) are typically trained with static data, ignoring that in real-world scenarios, streaming data of various sources may continuously grow. This requires PLMs to integrate the information from all the sources in a lifelong manner. Although this goal could be achieved by exhaustive pre-training on all the existing data, such a process is known to be computationally expensive. To this end, we propose ELLE, aiming at efficient lifelong pre-training for emerging data. Specifically, ELLE consists of (1) function preserved model expansion, which flexibly expands an existing PLM's width and depth to improve the efficiency of knowledge acquisition; and (2) pre-trained domain prompts, which disentangle the versatile knowledge learned during pre-training and stimulate the proper knowledge for downstream tasks. We experiment ELLE with streaming data from 5 domains on BERT and GPT. The results show the superiority of ELLE over various lifelong learning baselines in both pre-training efficiency and downstream performances. The codes are publicly available at https://github.com/thunlp/ELLE.
翻译:目前经过事先培训的语言模式(PLM)一般都经过静态数据培训,忽视了在现实世界情景下,各种来源的流数据可能会不断增长,这要求PLM以终生的方式整合所有来源的信息。尽管这一目标可以通过对所有现有数据进行详尽的预培训来实现,但众所周知,这种进程在计算上是昂贵的。为此,我们提议ELLE,目的是对新出现的数据进行有效的终身培训。具体地说,ELLE包括:(1) 功能保存模式扩展,灵活扩大现有的PLM的宽度和深度,以提高获取知识的效率;和(2) 预先培训域提示,将培训前所学到的多才多艺知识分解开来,并激发下游任务的适当知识。我们试验ELLE,将5个领域的数据流到BERT和GPT。结果显示ELE在培训前效率和下游业绩中优于各种终身学习基线。代码公布在https://github.com/thunp/ELLE。