Cycle-consistent generative adversarial networks (CycleGAN) have shown their promising performance for speech enhancement (SE), while one intractable shortcoming of these CycleGAN-based SE systems is that the noise components propagate throughout the cycle and cannot be completely eliminated. Additionally, conventional CycleGAN-based SE systems only estimate the spectral magnitude, while the phase is unaltered. Motivated by the multi-stage learning concept, we propose a novel two-stage denoising system that combines a CycleGAN-based magnitude enhancing network and a subsequent complex spectral refining network in this paper. Specifically, in the first stage, a CycleGAN-based model is responsible for only estimating magnitude, which is subsequently coupled with the original noisy phase to obtain a coarsely enhanced complex spectrum. After that, the second stage is applied to further suppress the residual noise components and estimate the clean phase by a complex spectral mapping network, which is a pure complex-valued network composed of complex 2D convolution/deconvolution and complex temporal-frequency attention blocks. Experimental results on two public datasets demonstrate that the proposed approach consistently surpasses previous one-stage CycleGANs and other state-of-the-art SE systems in terms of various evaluation metrics, especially in background noise suppression.
翻译:在多阶段学习概念的推动下,我们提出了一个新的两阶段分解系统,将以循环GAN为基础的增强音量网络和随后的复杂光谱改进网络结合起来。具体地说,在第一阶段,以循环GAN为基础的SE系统的一个棘手缺点是,噪音组件在整个周期内传播,无法完全消除。此外,以循环GAN为基础的常规SE系统只估计光谱量,而该阶段则没有改变。在多阶段学习概念的推动下,我们提议建立一个新型的两阶段分解系统,将以循环GAN为基础的增强音量网络和随后的复杂光谱改进网络结合起来。在第一阶段,以循环GAN为基础的模型只负责估计音量,然后与最初的噪音阶段相结合,以获得粗略增强的复杂频谱。在此之后,第二个阶段用于进一步压制残余噪音组件,并通过复杂的光谱绘图网络来估计清洁的阶段,这是一个纯复杂的复杂价值网络,由复杂的2D变异/变频/变频和复杂的时频关注区组成。两个公共数据集的实验结果显示,拟议的SE-CRELA-S-S-Pro-plain-plain-plain-plain-viewsural-viewslations-flock,特别是SE-view-ass-view-an-view-view-view-an-view-view-an-an-an-view-an-view-view-an-an-an-an-an-an-view-an-an-an-an-vical-an-view-view-an-view-an-s-an-an-an-an-an-an-an-an-an-an-an-an-an-an-an-an-an-an-an-an-an-an-an-an-an-an-an-an-an-an-an-an-an-an-an-an-an-an-an-an-an-an-an-an-an-an-an-an-an-an-an-an-an-an-an-an-an-an-an-an-an-an-an-an-an-an-an-an-an-an-an-an-an-an-