This paper considers continual learning of large-scale pretrained neural machine translation model without accessing the previous training data or introducing model separation. We argue that the widely used regularization-based methods, which perform multi-objective learning with an auxiliary loss, suffer from the misestimate problem and cannot always achieve a good balance between the previous and new tasks. To solve the problem, we propose a two-stage training method based on the local features of the real loss. We first search low forgetting risk regions, where the model can retain the performance on the previous task as the parameters are updated, to avoid the catastrophic forgetting problem. Then we can continually train the model within this region only with the new training data to fit the new task. Specifically, we propose two methods to search the low forgetting risk regions, which are based on the curvature of loss and the impacts of the parameters on the model output, respectively. We conduct experiments on domain adaptation and more challenging language adaptation tasks, and the experimental results show that our method can achieve significant improvements compared with several strong baselines.
翻译:本文考虑在没有获得先前的培训数据或引入模型分离的情况下继续学习大规模预先训练的神经机器翻译模型。我们认为,广泛使用的基于正规化的方法(这些方法进行多目标学习,附带损失)存在误估问题,不能总是在以往和新任务之间取得良好的平衡。为了解决问题,我们建议根据实际损失的当地特点,采用两阶段培训方法。我们首先寻找低忘却风险区域,在更新参数时,该模型可以保留前一项任务的业绩,以避免灾难性的遗忘问题。然后,我们只能用新的培训数据在这个区域内不断培训模型,以适应新的任务。具体地说,我们提出两种方法,分别根据损失的曲线和参数对模型输出的影响,分别寻找低遗忘风险区域。我们在领域适应和更具挑战性的语言适应任务方面进行了实验,实验结果显示,与几个强有力的基线相比,我们的方法可以取得显著改进。