半参数语言模型是可缩放的连续不断学习者</s> (Semiparametric Language Models Are Scalable Continual Learners)

Semiparametric language models (LMs) have shown promise in continuously learning from new text data by combining a parameterized neural LM with a growable non-parametric memory for memorizing new content. However, conventional semiparametric LMs will finally become prohibitive for computing and storing if they are applied to continual learning over streaming data, because the non-parametric memory grows linearly with the amount of data they learn from over time. To address the issue of scalability, we present a simple and intuitive approach called Selective Memorization (SeMem), which only memorizes difficult samples that the model is likely to struggle with. We demonstrate that SeMem improves the scalability of semiparametric LMs for continual learning over streaming data in two ways: (1) data-wise scalability: as the model becomes stronger through continual learning, it will encounter fewer difficult cases that need to be memorized, causing the growth of the non-parametric memory to slow down over time rather than growing at a linear rate with the size of training data; (2) model-wise scalability: SeMem allows a larger model to memorize fewer samples than its smaller counterpart because it is rarer for a larger model to encounter incomprehensible cases, resulting in a non-parametric memory that does not scale linearly with model size. We conduct extensive experiments in language modeling and downstream tasks to test SeMem's results, showing SeMem enables a semiparametric LM to be a scalable continual learner with little forgetting.

翻译：半对称语言模型(LMS) 显示,通过将参数化神经LM(SeMem)与该模型可能与的难点样本结合起来,不断从新的文本数据中学习。然而,如果将常规半对称LMS(LMS)应用到对流数据的持续学习中,传统的半对称LMS(LMS)将最终成为无法计算和存储的工具,因为非对称LMS(LMS)将随着从流数据流中学习的数据量的不断学习而线性增长。为了解决可缩放问题,我们提出了一个简单和直观的方法,称为SememM(SeMem),它只是回忆该模型中可能与之挣扎的难点样本。我们证明,Sememem将提高半对流数据持续学习的可缩放性半对流数据的缩放性:(1) 数据错放的缩放性:随着模型随着不断学习而变得更强,将遇到的难点,因此不难点的记忆模型会随着时间增长,而不是直线性数据的规模而增长;(2) 模型的不断的缩缩缩缩缩缩缩缩缩缩缩缩缩,因为在模型中使模型的缩缩缩缩缩缩成一个模型的缩缩缩缩缩成一个模型的缩成一个模型的缩缩缩成一个模型的缩缩缩缩成一个模型的缩缩缩缩缩的缩的缩的缩的缩的缩成的缩缩缩缩成的缩的缩的缩缩缩缩的缩的缩的缩的缩的缩的缩的缩的缩图,因为在模型,因为在模型,因为在模型的缩的缩的缩的缩的缩的缩的缩的缩缩缩的缩成的缩成的缩成的缩的缩成的缩成的缩成的缩缩缩缩的缩的缩的缩的缩的缩的缩的缩的缩的缩缩图的缩的缩图的缩缩缩略图。</s>

相关内容

Continuity

关注 4

让 iOS 8 和 OS X Yosemite 无缝切换的一个新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source: Apple - iOS 8

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

知识图谱推理，50页ppt，Salesforce首席科学家Richard Socher

专知会员服务

111+阅读 · 2020年6月10日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日