We present a neural-network-based fast diffuse room impulse response generator (FAST-RIR) for generating room impulse responses (RIRs) for a given acoustic environment. Our FAST-RIR takes rectangular room dimensions, listener and speaker positions, and reverberation time as inputs and generates specular and diffuse reflections for a given acoustic environment. Our FAST-RIR is capable of generating RIRs for a given input reverberation time with an average error of 0.02s. We evaluate our generated RIRs in automatic speech recognition (ASR) applications using Google Speech API, Microsoft Speech API, and Kaldi tools. We show that our proposed FAST-RIR with batch size 1 is 400 times faster than a state-of-the-art diffuse acoustic simulator (DAS) on a CPU and gives similar performance to DAS in ASR experiments. Our FAST-RIR is 12 times faster than an existing GPU-based RIR generator (gpuRIR). We show that our FAST-RIR outperforms gpuRIR by 2.5% in an AMI far-field ASR benchmark.
翻译:我们展示了一个基于神经网络的快速扩散室脉冲反应发电机(FAST-RIR),用于为特定音响环境生成室脉冲反应(RIRs)。我们的FAST-RIR采用矩形室尺寸、收听器和扬声器位置以及回声时间作为输入,为给定音响环境生成光谱和扩散反射镜。我们的FAST-RIR能够为给定输入反射时间生成RIR(RIRs),平均误差为0.02秒。我们用谷歌语音API、微软语音API和Kaldi工具对自动语音识别(ASR)应用中生成的RIRs进行了评估。我们显示,我们提议的FAST-RIR(FAST-RIR)的批量尺寸为1的尺寸比电磁共振动器(DAS)快400倍,在ASR实验中给DAS带来类似的性能。我们的FAST-RIR比现有的GU-RIR发电机(GPRIR)快12倍。我们显示我们的FAST-RIR(AMI远方位基准为2.5)超过A-2.5。