Automatic speech recognition (ASR) technologies today are primarily optimized for given datasets; thus, any changes in the application environment (e.g., acoustic conditions or topic domains) may inevitably degrade the performance. We can collect new data describing the new environment and fine-tune the system, but this naturally leads to higher error rates for the earlier datasets, referred to as catastrophic forgetting. The concept of lifelong learning (LLL) aiming to enable a machine to sequentially learn new tasks from new datasets describing the changing real world without forgetting the previously learned knowledge is thus brought to attention. This paper reports, to our knowledge, the first effort to extensively consider and analyze the use of various approaches of LLL in end-to-end (E2E) ASR, including proposing novel methods in saving data for past domains to mitigate the catastrophic forgetting problem. An overall relative reduction of 28.7% in WER was achieved compared to the fine-tuning baseline when sequentially learning on three very different benchmark corpora. This can be the first step toward the highly desired ASR technologies capable of synchronizing with the continuously changing real world.
翻译:今天,自动语音识别(ASR)技术主要在特定数据集方面得到优化;因此,应用环境的任何变化(例如声学条件或主题域)都可能不可避免地降低性能。我们可以收集描述新环境和微调系统的新数据,但这自然会提高早期数据集的误差率,称为灾难性的遗忘。终身学习概念(LLL)旨在让机器能够从描述不断变化的真实世界的新数据集中按顺序学习新任务,同时不忘记以前学到的知识。本文报告,根据我们的知识,在从终端到终端(E2E)ASR中广泛考虑和分析使用LLLL方法的各种方法的首次努力,包括提出为以往域保存数据的新方法,以减轻灾难性的遗忘问题。在三个截然不同的基准子体上按顺序学习时,WER实现了28.7%的总体相对比微调基线的相对减少。这可能是向高度需要的ASR技术迈进的第一步,这些技术能够与不断改变的现实世界同步。