Speech enhancement (SE) is proved effective in reducing noise from noisy speech signals for downstream automatic speech recognition (ASR), where multi-task learning strategy is employed to jointly optimize these two tasks. However, the enhanced speech learned by SE objective may not always yield good ASR results. From the optimization view, there sometimes exists interference between the gradients of SE and ASR tasks, which could hinder the multi-task learning and finally lead to sub-optimal ASR performance. In this paper, we propose a simple yet effective approach called gradient remedy (GR) to solve interference between task gradients in noise-robust speech recognition, from perspectives of both angle and magnitude. Specifically, we first project the SE task's gradient onto a dynamic surface that is at acute angle to ASR gradient, in order to remove the conflict between them and assist in ASR optimization. Furthermore, we adaptively rescale the magnitude of two gradients to prevent the dominant ASR task from being misled by SE gradient. Experimental results show that the proposed approach well resolves the gradient interference and achieves relative word error rate (WER) reductions of 9.3% and 11.1% over multi-task learning baseline, on RATS and CHiME-4 datasets, respectively. Our code is available at GitHub.
翻译:强化语音(SE)在减少下游自动语音识别(ASR)的噪音方面证明是有效的,因为使用多任务学习战略来共同优化这两项任务。然而,SE目标所学的强化语音不一定总能产生良好的ASR结果。从优化角度看,SE任务和ASR任务的梯度之间有时存在干扰,这可能会阻碍多任务学习,并最终导致ASR的次最佳性能。在本文件中,我们提出了一个简单而有效的方法,称为梯度补救(GR),以解决从角度和规模角度来看,在噪音-robust语音识别中任务梯度之间的干扰。具体地说,我们首先将SE任务梯度的梯度投向一个动态表面,该表的角对ASR梯度来说是尖锐的,目的是消除它们之间的冲突,协助ASR优化。此外,我们调整了两个梯度的梯度规模,以防止占主导地位的ASR任务被SE值误导。实验结果显示,拟议的方法很好地解决了梯度干扰,并实现了相对字差差率(WER)减少9.3%和11.1%/CHTAS 分别是我们在RAMTAS数据库和11.