Speech enhancement (SE) approaches can be classified into supervised and unsupervised categories. For unsupervised SE, a well-known cycle-consistent generative adversarial network (CycleGAN) model, which comprises two generators and two discriminators, has been shown to provide a powerful nonlinear mapping ability and thus achieve a promising noise-suppression capability. However, a low-efficiency training process along with insufficient knowledge between noisy and clean speech may limit the enhancement performance of the CycleGAN SE at runtime. In this study, we propose a novel noise-informed-training CycleGAN approach that incorporates additional inputs into the generators and discriminators to assist the CycleGAN in learning a more accurate transformation of speech signals between the noise and clean domains. The additional input feature serves as an indicator that provides more information during the CycleGAN training stage. Experiment results confirm that the proposed approach can improve the CycleGAN SE model while achieving a better sound quality and fewer signal distortions.
翻译:语音增强(SE)方法可以分为监督和不受监督的类别。对于无人监督的 SE 模式,即由两台发电机和两台歧视器组成的众所周知的循环兼容的基因对抗网络(CycleGAN)模型,已经证明提供了强大的非线性绘图能力,从而实现了有希望的噪音抑制能力。然而,低效率的培训过程,加上噪音和清洁言语之间的知识不足,可能会限制循环GAN SE在运行时的增强性能。在本研究中,我们建议采用一种新的噪音-知情培训循环GAN方法,将更多的投入纳入生成者和歧视者,以协助循环GAN学习更精确地转换噪音和清洁领域之间的语音信号。附加输入功能作为指标,在循环GAN培训阶段提供更多的信息。实验结果证实,拟议的方法可以改进循环GAN SE 模式,同时提高质量,减少信号扭曲。