Continual learning (CL), or domain expansion, recently became a popular topic for automatic speech recognition (ASR) acoustic modeling because practical systems have to be updated frequently in order to work robustly on types of speech not observed during initial training. While sequential adaptation allows tuning a system to a new domain, it may result in performance degradation on the old domains due to catastrophic forgetting. In this work we explore regularization-based CL for neural network acoustic models trained with the lattice-free maximum mutual information (LF-MMI) criterion. We simulate domain expansion by incrementally adapting the acoustic model on different public datasets that include several accents and speaking styles. We investigate two well-known CL techniques, elastic weight consolidation (EWC) and learning without forgetting (LWF), which aim to reduce forgetting by preserving model weights or network outputs. We additionally introduce a sequence-level LWF regularization, which exploits posteriors from the denominator graph of LF-MMI to further reduce forgetting. Empirical results show that the proposed sequence-level LWF can improve the best average word error rate across all domains by up to 9.4% relative compared with using regular LWF.
翻译:持续学习(CL),或领域扩展,最近成为自动语音识别(ASR)声学模型的一个流行主题,因为实际系统必须经常更新,以便有力地处理初始培训期间未观察到的语音类型。尽管顺序调整允许将系统调整到一个新的领域,但可能导致旧领域因灾难性的遗忘而出现性能退化。在这项工作中,我们探索以正规化为基础的神经网络声学模型的CL,这些网络的声学模型以无边际最大相互信息标准(LF-MMI)为培训。我们通过逐步调整不同公共数据集的声学模型,包括若干口音和发言风格,模拟域的扩展。我们调查了两种众所周知的CLL技术,弹性重量整合和学习而不忘记(LWF),其目的是通过保存模型重量或网络输出减少遗忘。我们还引入了序列级LFFS规范,利用LF-MI的分母图的后表进一步减少遗忘。Epricalal结果显示,拟议的序列级LFFF可以改善所有领域的最佳平均字差率率率,直到9.4%,而经常使用LFFFS的相对为9.4。