Recently, speech enhancement technologies that are based on deep learning have received considerable research attention. If the spatial information in microphone signals is exploited, microphone arrays can be advantageous under some adverse acoustic conditions compared with single-microphone systems. However, multichannel speech enhancement is often performed in the short-time Fourier transform (STFT) domain, which renders the enhancement approach computationally expensive. To remedy this problem, we propose a novel equivalent rectangular bandwidth (ERB)-scaled spatial coherence feature that is dependent on the target speaker activity between two ERB bands. Experiments conducted using a four-microphone array in a reverberant environment, which involved speech interference, demonstrated the efficacy of the proposed system. This study also demonstrated that a network that was trained with the ERB-scaled spatial feature was robust against variations in the geometry and number of the microphones in the array.
翻译:最近,基于深层学习的语音增强技术受到相当程度的研究关注,如果利用麦克风信号中的空间信息,麦克风阵列在某些不利的声学条件下可能比单一麦克风系统有利,然而,多声道增强往往在短时间的四架变换(STFT)域进行,这使增强方法在计算上变得昂贵。为了纠正这个问题,我们提议了一个新的等同的矩形带宽(ERB)空间一致性特征,该特征取决于两个ERB波段之间的目标扬声器活动。在变换环境中利用四部麦克风阵列进行的实验,其中涉及语音干扰,显示了拟议系统的功效。这一研究还表明,一个经过ERB尺度空间特征培训的网络,对阵列中麦克风的几何和数目的变化起到了强大的作用。