Neural audio super-resolution models are typically trained on low- and high-resolution audio signal pairs. Although these methods achieve highly accurate super-resolution if the acoustic characteristics of the input data are similar to those of the training data, challenges remain: the models suffer from quality degradation for out-of-domain data, and paired data are required for training. To address these problems, we propose Dual-CycleGAN, a high-quality audio super-resolution method that can utilize unpaired data based on two connected cycle consistent generative adversarial networks (CycleGAN). Our method decomposes the super-resolution method into domain adaptation and resampling processes to handle acoustic mismatch in the unpaired low- and high-resolution signals. The two processes are then jointly optimized within the CycleGAN framework. Experimental results verify that the proposed method significantly outperforms conventional methods when paired data are not available. Code and audio samples are available from https://chomeyama.github.io/DualCycleGAN-Demo/.
翻译:虽然这些方法在输入数据的声学特性与培训数据相似的情况下实现了高度精确的超分辨率,但挑战依然存在:这些模型在外部数据方面质量退化,需要为培训提供配对数据。为了解决这些问题,我们提议采用双环式超分辨率模型,这是一种高质量的音频超级分辨率方法,可以利用基于两个相连周期一致的遗传对抗网络(CycleGAN)的未保存数据。我们的方法将超级分辨率方法分解为域适应和重新采样过程,以处理未发送的低分辨率和高分辨率信号中的声调不匹配问题。然后,在SypleGAN框架内共同优化这两个程序。实验结果证实,在没有配对数据时,拟议方法大大优于常规方法。代码和音频样本可从https://chomeyama.github.io/DualCycleGAN-Demo/获得。