The training of modern speech processing systems often requires a large amount of simulated room impulse response (RIR) data in order to allow the systems to generalize well in real-world, reverberant environments. However, simulating realistic RIR data typically requires accurate physical modeling, and the acceleration of such simulation process typically requires certain computational platforms such as a graphics processing unit (GPU). In this paper, we propose FRA-RIR, a fast random approximation method of the widely-used image-source method (ISM), to efficiently generate realistic RIR data without specific computational devices. FRA-RIR replaces the physical simulation in the standard ISM by a series of random approximations, which significantly speeds up the simulation process and enables its application in on-the-fly data generation pipelines. Experiments show that FRA-RIR can not only be significantly faster than other existing ISM-based RIR simulation tools on standard computational platforms, but also improves the performance of speech denoising systems evaluated on real-world RIR when trained with simulated RIR. A Python implementation of FRA-RIR is available online\footnote{\url{https://github.com/yluo42/FRA-RIR}}.
翻译:现代语音处理系统的培训往往需要大量的模拟室脉冲反应(RIR)数据,以便这些系统能够在真实的、反动的环境中全面推广。然而,模拟现实的RIR数据通常需要精确的物理模型模型,而这种模拟过程的加速则一般需要某些计算平台,如图形处理器(GPU)等。在本文中,我们提议FRA-RIR,这是广泛使用的图像源方法的一种快速随机近似方法,以便在没有具体计算设备的情况下有效生成现实的RIR数据。FRA-RIR用一系列随机近似来取代标准IMS的物理模拟,这大大加快了模拟过程,并使其能够在飞行数据管道上应用。实验表明FRA-RIR不仅比标准计算平台上现有的基于ISM的RIR模拟工具快得多,而且改进了在接受模拟RIR培训时在现实世界RIR上评价的语音分解系统的性能。