Convolutional recurrent networks (CRN) integrating a convolutional encoder-decoder (CED) structure and a recurrent structure have achieved promising performance for monaural speech enhancement. However, feature representation across frequency context is highly constrained due to limited receptive fields in the convolutions of CED. In this paper, we propose a convolutional recurrent encoder-decoder (CRED) structure to boost feature representation along the frequency axis. The CRED applies frequency recurrence on 3D convolutional feature maps along the frequency axis following each convolution, therefore, it is capable of catching long-range frequency correlations and enhancing feature representations of speech inputs. The proposed frequency recurrence is realized efficiently using a feedforward sequential memory network (FSMN). Besides the CRED, we insert two stacked FSMN layers between the encoder and the decoder to model further temporal dynamics. We name the proposed framework as Frequency Recurrent CRN (FRCRN). We design FRCRN to predict complex Ideal Ratio Mask (cIRM) in complex-valued domain and optimize FRCRN using both time-frequency-domain and time-domain losses. Our proposed approach achieved state-of-the-art performance on wideband benchmark datasets and achieved 2nd place for the real-time fullband track in terms of Mean Opinion Score (MOS) and Word Accuracy (WAcc) in the ICASSP 2022 Deep Noise Suppression (DNS) challenge (https://github.com/alibabasglab/FRCRN).
翻译:包含一个 convolution encoder-decoder (CRED) 结构和一个经常性结构的CRN 的连动重复式网络(CRN) 结合一个 convolution coder-decoder (CED) 结构和一个经常性结构,在提高调音器的调频性能方面取得了有希望的成绩;然而,由于CED 的连动中接收场有限,不同频率背景的特征代表受到高度限制;在本文件中,我们提议一个CRED(CRED) 结构,以沿频率轴提升特征代表。CRRED 将3D 的3D convolution 地段地段图上的频率重复应用,因此,CRCRN 能够捕捉到远程频率的关联,并加强语音投入的特征表现。除了CRED(FM)之外,拟议的频率重现频率重复使用一个反馈的连续后继存储网(FSMN) 网络。除了CRED(CRD) 20-CRBS/CRM(CRRRRRM) 的实时和跨时间- bal-bal-bal-bal-al-bal-al-al-bal-al-al-albalbalbal) 方法,我们在复杂域域域域域域域域域域域上拟议的20-al-albalbalbal-al-albal-al-al-al-albal-al-al-al-al-al-albalbalbalbalbal-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-mobal-SBal-SPal-SBal-SBalbal-al-al-SBal-SBal-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-al-S