Adapting Automatic Speech Recognition (ASR) models to new domains results in a deterioration of performance on the original domain(s), a phenomenon called Catastrophic Forgetting (CF). Even monolingual ASR models cannot be extended to new accents, dialects, topics, etc. without suffering from CF, making them unable to be continually enhanced without storing all past data. Fortunately, Continual Learning (CL) methods, which aim to enable continual adaptation while overcoming CF, can be used. In this paper, we implement an extensive number of CL methods for End-to-End ASR and test and compare their ability to extend a monolingual Hybrid CTC-Transformer model across four new tasks. We find that the best performing CL method closes the gap between the fine-tuned model (lower bound) and the model trained jointly on all tasks (upper bound) by more than 40%, while requiring access to only 0.6% of the original data.
翻译:将自动语音识别模式(ASR)适应到新的领域会导致原有域域的性能恶化,这是一种称为“灾难性遗忘”的现象。即使是单一语言的ASR模式也不可能在不受到CF影响的情况下推广到新的口音、方言、专题等,这使得它们无法在不储存过去所有数据的情况下不断得到提升。幸运的是,可以使用旨在允许持续适应同时又克服CF的连续学习方法。在本文件中,我们应用了大量的CL方法,用于最终至最终的ASR,并测试和比较它们将单一语言混合的CT- Transext模型扩展至四项新任务的能力。我们发现,最有效的CL方法缩小了微调模式(低约束)与所有任务(上约束)联合培训的模式之间的差距超过40%,同时要求只获得原始数据的0.6%。