The direct expansion of deep neural network (DNN) based wide-band speech enhancement (SE) to full-band processing faces the challenge of low frequency resolution in low frequency range, which would highly likely lead to deteriorated performance of the model. In this paper, we propose a learnable spectral compression mapping (SCM) to effectively compress the high frequency components so that they can be processed in a more efficient manner. By doing so, the model can pay more attention to low and middle frequency range, where most of the speech power is concentrated. Instead of suppressing noise in a single network structure, we first estimate a spectral magnitude mask, converting the speech to a high signal-to-ratio (SNR) state, and then utilize a subsequent model to further optimize the real and imaginary mask of the pre-enhanced signal. We conduct comprehensive experiments to validate the efficacy of the proposed method.
翻译:将深神经网络(DNN)的宽频语音增强(SE)直接扩展至全频处理面临着低频分辨率低频范围的挑战,这极有可能导致模型的性能恶化。在本文中,我们提出一个可学习的光谱压缩绘图(SCM),以有效压缩高频组件,以便以更有效的方式进行处理。通过这样做,该模型可以更多地关注中低频范围,即大部分语音能量集中的中低频范围。我们首先估计的不是在单一网络结构中抑制噪音,而是将光谱强度遮罩转换成高信号至信号状态,然后利用随后的模型进一步优化增强前信号的真实和想象的遮罩。我们进行了全面实验,以验证拟议方法的功效。