Generative adversarial networks have recently demonstrated outstanding performance in neural vocoding outperforming best autoregressive and flow-based models. In this paper, we show that this success can be extended to other tasks of conditional audio generation. In particular, building upon HiFi vocoders, we propose a novel HiFi++ general framework for bandwidth extension and speech enhancement. We show that with the improved generator architecture and simplified multi-discriminator training, HiFi++ performs better or on par with the state-of-the-art in these tasks while spending significantly less computational resources. The effectiveness of our approach is validated through a series of extensive experiments.
翻译:生成式对立网络最近在神经蒸发优于最佳自动递减和流动模式方面表现出杰出的性能。在本文中,我们证明这一成功可以扩展到有条件的音频生成的其他任务。特别是,在HiFi vocoders的基础上,我们提出了一个新的HiFi++带宽扩展和语音增强总框架。我们表明,随着发电机结构的改善和简化的多差异培训,HiFi++在这些任务中表现更好或与最新水平相当,同时花费的计算资源也大大减少。我们的方法的有效性通过一系列广泛的实验得到验证。