In recent years, Generative Adversarial Networks (GANs) have produced significantly improved results in speech enhancement (SE) tasks. They are difficult to train, however. In this work, we introduce several improvements to the GAN training schemes, which can be applied to most GAN-based SE models. We propose using consistency loss functions, which target the inconsistency in time and time-frequency domains caused by Fourier and Inverse Fourier Transforms. We also present self-correcting optimization for training a GAN discriminator on SE tasks, which helps avoid "harmful" training directions for parts of the discriminator loss function. We have tested our proposed methods on several state-of-the-art GAN-based SE models and obtained consistent improvements, including new state-of-the-art results for the Voice Bank+DEMAND dataset.
翻译:近年来,创生反转网络(GANs)在语言强化任务方面产生了显著改善的结果,但很难进行培训。在这项工作中,我们对GAN培训计划作了一些改进,这些改进可适用于大多数基于GAN的SE模式。我们建议使用一致性损失功能,针对Fourier和Inverse Fourier变换造成的时间和时间频率不一致问题。我们还为培训一名GAN歧视者完成SE任务提供了自我纠正优化,这有助于避免对部分歧视者损失功能进行“有害的”培训方向。我们已经对一些以GAN为基础的SE模式进行了测试,并取得了一致的改进,包括语音银行+DEMAND数据集的最新结果。