Knowing the geometrical and acoustical parameters of a room may benefit applications such as audio augmented reality, speech dereverberation or audio forensics. In this paper, we study the problem of jointly estimating the total surface area, the volume, as well as the frequency-dependent reverberation time and mean surface absorption of a room in a blind fashion, based on two-channel noisy speech recordings from multiple, unknown source-receiver positions. A novel convolutional neural network architecture leveraging both single- and inter-channel cues is proposed and trained on a large, realistic simulated dataset. Results on both simulated and real data show that using multiple observations in one room significantly reduces estimation errors and variances on all target quantities, and that using two channels helps the estimation of surface and volume. The proposed model outperforms a recently proposed blind volume estimation method on the considered datasets.
翻译:了解一个房间的几何和声学参数可能有益于应用,如音频增强现实、语音偏差或音频法证等。在本文件中,我们研究了根据多个未知源接收器位置的双声道噪音录音,以盲目方式共同估计一个房间的总面积、体积、以及视频率而异的时间和平均表面吸收率的问题。提出了一个新的利用单一和跨频道信号的神经神经网络结构,并在一个大型、现实的模拟数据集方面进行了培训。模拟和真实数据的结果表明,使用一个房间的多次观测可以大大减少所有目标数量的估计误差和差异,并且使用两个渠道有助于估计表层和体积。拟议的模型比最近提议的关于考虑的数据集的盲体估计方法要强。