Mask processing in the time-frequency (T-F) domain through the neural network has been one of the mainstreams for single-channel speech enhancement. However, it is hard for most models to handle the situation when harmonics are partially masked by noise. To tackle this challenge, we propose a harmonic gated compensation network (HGCN). We design a high-resolution harmonic integral spectrum to improve the accuracy of harmonic locations prediction. Then we add voice activity detection (VAD) and voiced region detection (VRD) to the convolutional recurrent network (CRN) to filter harmonic locations. Finally, the harmonic gating mechanism is used to guide the compensation model to adjust the coarse results from CRN to obtain the refinedly enhanced results. Our experiments show HGCN achieves substantial gain over a number of advanced approaches in the community.
翻译:通过神经网络处理时频(T-F)域面罩一直是单一通道语音增强的主流之一,然而,大多数模型很难处理由噪音部分遮盖的调音器的情况。为了应对这一挑战,我们提议建立一个调音门补偿网络(HGCN)。我们设计了一个高分辨率的调音组合频谱,以提高调频定位预测的准确性。然后,我们将语音活动探测(VAD)和声音区域探测(VRD)加入共振常态网络(CRN)以过滤调频器位置。最后,使用调音格机制指导补偿模型调整CRN的粗糙结果,以获得精细增强的结果。我们的实验显示,HGCN在社区的一些先进方法上取得了巨大收益。