Modeling room acoustics in a field setting involves some degree of blind parameter estimation from noisy and reverberant audio. Modern approaches leverage convolutional neural networks (CNNs) in tandem with time-frequency representation. Using short-time Fourier transforms to develop these spectrogram-like features has shown promising results, but this method implicitly discards a significant amount of audio information in the phase domain. Inspired by recent works in speech enhancement, we propose utilizing novel phase-related features to extend recent approaches to blindly estimate the so-called "reverberation fingerprint" parameters, namely, volume and RT60. The addition of these features is shown to outperform existing methods that rely solely on magnitude-based spectral features across a wide range of acoustics spaces. We evaluate the effectiveness of the deployment of these novel features in both single-parameter and multi-parameter estimation strategies, using a novel dataset that consists of publicly available room impulse responses (RIRs), synthesized RIRs, and in-house measurements of real acoustic spaces.
翻译:模拟室声学在现场环境中的建模室声学涉及来自噪音和回声音学的某种程度的盲点参数估计。现代方法在利用时频代表的同时利用进化神经网络(CNNs ) 。使用短时间的 Fourier变异来开发这些光谱特征已经显示出令人乐观的结果,但这种方法隐含地抛弃了阶段域的大量音频信息。受最近加强语音的工作的启发,我们提议利用与阶段有关的新特征来推广最近的方法,盲目估计所谓的“变异指纹”参数,即音量和RT60。这些特征的添加超过了完全依赖大量声学空间基于星级的光谱特征的现有方法。我们利用由公众可用的室脉冲反应(RIRs)、合成RIRs和内部对真实声学空间的测量组成的新数据集,评估在单参数和多参数估计战略中部署这些新特征的实效。我们评估了这些新数据集,其中包括公开存在的室脉冲反应(RIRs)、合成RIRs(RIRs)和内部测量。</s>