This work describes a speech denoising system for machine ears that aims to improve speech intelligibility and the overall listening experience in noisy environments. We recorded approximately 100 hours of audio data with reverberation and moderate environmental noise using a pair of microphone arrays placed around each of the two ears and then mixed sound recordings to simulate adverse acoustic scenes. Then, we trained a multi-channel speech denoising network (MCSDN) on the mixture of recordings. To improve the training, we employ an unsupervised method, complex angular central Gaussian mixture model (cACGMM), to acquire cleaner speech from noisy recordings to serve as the learning target. We propose a MCSDN-Beamforming-MCSDN framework in the inference stage. The results of the subjective evaluation show that the cACGMM improves the training data, resulting in better noise reduction and user preference, and the entire system improves the intelligibility and listening experience in noisy situations.
翻译:这项工作描述了一种机器耳朵的语音分解系统,目的是提高声音的能见度和在吵闹环境中的总体听觉经验。我们记录了大约100个小时的音频数据,用两个耳朵周围安装一对麦克风阵列和中度环境噪音,然后进行混合录音,以模拟不利的声场。然后,我们就录音的混合进行多声道分解网络培训。为了改进培训,我们采用了一种无人监督的方法,复杂的角中央高斯混合模型(CACGMM),从吵闹的录音中获取较清洁的言语,作为学习的目标。我们提议在推断阶段建立MCCDN-Beamforming-MCSDN框架。主观评价结果表明,CACGMM改进了培训数据,从而降低了噪音和用户的偏好,整个系统改善了噪音情况下的智能和倾听经验。