Adapting a trained Automatic Speech Recognition (ASR) model to new tasks results in catastrophic forgetting of old tasks, limiting the model's ability to learn continually and to be extended to new speakers, dialects, languages, etc. Focusing on End-to-End ASR, in this paper, we propose a simple yet effective method to overcome catastrophic forgetting: weight averaging. By simply taking the average of the previous and the adapted model, our method achieves high performance on both the old and new tasks. It can be further improved by introducing a knowledge distillation loss during the adaptation. We illustrate the effectiveness of our method on both monolingual and multilingual ASR. In both cases, our method strongly outperforms all baselines, even in its simplest form.
翻译:为使经过培训的自动语音识别模式适应新任务,导致灾难性地忘记旧任务,限制该模式不断学习的能力,并扩大到新的讲者、方言、语言等。 本文重点介绍了一个简单而有效的方法,以克服灾难性的遗忘:平均体重。我们的方法仅仅采用以往和经调整的模式的平均数,就可以在旧任务和新任务上取得高绩效。在适应期间引入知识蒸馏损失,可以进一步改进。我们举例说明了我们方法在单语和多语言的ASR上的有效性。在这两种情况下,我们的方法都大大超越了所有基线,即使是最简单的基线。</s>