在低遗忘风险地区内持续学习神经机器翻译 (Continual Learning of Neural Machine Translation within Low Forgetting Risk Regions)

This paper considers continual learning of large-scale pretrained neural machine translation model without accessing the previous training data or introducing model separation. We argue that the widely used regularization-based methods, which perform multi-objective learning with an auxiliary loss, suffer from the misestimate problem and cannot always achieve a good balance between the previous and new tasks. To solve the problem, we propose a two-stage training method based on the local features of the real loss. We first search low forgetting risk regions, where the model can retain the performance on the previous task as the parameters are updated, to avoid the catastrophic forgetting problem. Then we can continually train the model within this region only with the new training data to fit the new task. Specifically, we propose two methods to search the low forgetting risk regions, which are based on the curvature of loss and the impacts of the parameters on the model output, respectively. We conduct experiments on domain adaptation and more challenging language adaptation tasks, and the experimental results show that our method can achieve significant improvements compared with several strong baselines.

翻译：本文考虑在没有获得先前的培训数据或引入模型分离的情况下继续学习大规模预先训练的神经机器翻译模型。我们认为,广泛使用的基于正规化的方法(这些方法进行多目标学习,附带损失)存在误估问题,不能总是在以往和新任务之间取得良好的平衡。为了解决问题,我们建议根据实际损失的当地特点,采用两阶段培训方法。我们首先寻找低忘却风险区域,在更新参数时,该模型可以保留前一项任务的业绩,以避免灾难性的遗忘问题。然后,我们只能用新的培训数据在这个区域内不断培训模型,以适应新的任务。具体地说,我们提出两种方法,分别根据损失的曲线和参数对模型输出的影响,分别寻找低遗忘风险区域。我们在领域适应和更具挑战性的语言适应任务方面进行了实验,实验结果显示,与几个强有力的基线相比,我们的方法可以取得显著改进。

相关内容

Continuity

关注 4

让 iOS 8 和 OS X Yosemite 无缝切换的一个新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source: Apple - iOS 8

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

33页PPT【AI+天气预测】，AI and Machine learning for weather predictions

专知会员服务

35+阅读 · 2022年3月5日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日