This paper presents our latest investigations on improving automatic speech recognition for noisy speech via speech enhancement. We propose a novel method named Multi-discriminators CycleGAN to reduce noise of input speech and therefore improve the automatic speech recognition performance. Our proposed method leverages the CycleGAN framework for speech enhancement without any parallel data and improve it by introducing multiple discriminators that check different frequency areas. Furthermore, we show that training multiple generators on homogeneous subset of the training data is better than training one generator on all the training data. We evaluate our method on CHiME-3 data set and observe up to 10.03% relatively WER improvement on the development set and up to 14.09% on the evaluation set.
翻译:本文介绍了我们最近关于通过增强语言能力来改进对噪音演讲的自动语音识别的调查。 我们提议了一种名为多分辨者循环GAN的新颖方法,以减少输入语音的噪音,从而改进自动语音识别性能。 我们提议的方法利用循环GAN框架来增强语音,而没有任何平行的数据,并通过引入多个歧视者来检查不同频率区加以改进。 此外,我们还表明,在培训数据同质子集上培训多个生成者比培训一个生成者了解所有培训数据要好。 我们评估了我们关于CHiME-3数据集的方法,发现开发的WER改进率高达10.03%,评估组的WER改进率为14.09%。