Speech enhancement aims to obtain speech signals with high intelligibility and quality from noisy speech. Recent work has demonstrated the excellent performance of time-domain deep learning methods, such as Conv-TasNet. However, these methods can be degraded by the arbitrary scales of the waveform induced by the scale-invariant signal-to-noise ratio (SI-SNR) loss. This paper proposes a new framework called Time-domain Speech Enhancement Generative Adversarial Network (TSEGAN), which is an extension of the generative adversarial network (GAN) in time-domain with metric evaluation to mitigate the scaling problem, and provide model training stability, thus achieving performance improvement. In addition, we provide a new method based on objective function mapping for the theoretical analysis of the performance of Metric GAN, and explain why it is better than the Wasserstein GAN. Experiments conducted demonstrate the effectiveness of our proposed method, and illustrate the advantage of Metric GAN.
翻译:近来的工作表明,Conv-TasNet等长期深层学习方法的出色表现,但是,由于规模变化信号与噪音比率(SI-SNR)损失引起的波形的任意规模,这些方法可能会退化。本文提议一个新的框架,称为“时间-主题语音增强基因反向网络(TESEGAN)”,这是基因对抗网络(GAN)在时间范围内的延伸,通过量度评估来减轻规模问题,并提供示范培训稳定性,从而实现性能改进。此外,我们提供了一种基于客观功能绘图的新方法,用于对Metric GAN的性能进行理论分析,并解释为什么它比Wasserstein GAN更好。 所进行的实验显示了我们拟议方法的有效性,并说明了MetriGAN的优势。