We present a Generative Adversarial Network (GAN) based room impulse response generator for generating realistic synthetic room impulse responses. Our proposed generator can create synthetic room impulse responses by parametrically controlling the acoustic features captured in real-world room impulse responses. Our GAN-based room impulse response generator (IR-GAN) is capable of improving far-field automatic speech recognition in environments not known during training. We create far-field speech training set by augmenting our synthesized room impulse responses with clean LibriSpeech dataset. We evaluate the quality of our room impulse responses on the real-world LibriSpeech test set created using real impulse responses from BUT ReverbDB and AIR datasets. Furthermore, we combine our synthetic data with synthetic impulse responses generated using acoustic simulators, and this combination can reduce the word error rate by up to 14.3% in far-field speech recognition benchmarks.
翻译:我们展示了一个基于GAN的General Aversarial Network(GAN) 室内脉冲反应源, 以产生现实的合成室脉冲反应。 我们所推荐的发电机可以通过对真实世界室脉冲反应中所捕捉的声学特性进行参数性控制来产生合成室脉冲反应。 我们的GAN室脉冲反应源(IR-GAN)能够在培训期间未知的环境中改进远方自动语音识别。 我们通过清洁的LibriSpeech数据集来增加我们综合室脉冲反应来建立远方语音培训。 我们用But ReverbDB 和 AIR 数据集的实际脉冲反应来评估我们用真实世界LibriSpeech 测试组生成的室脉冲反应的质量。 此外,我们将我们的合成数据与使用声音模拟器生成的合成脉冲反应结合起来,这种组合可以在远方语音识别基准中将单词错误率降低14.3%。