Cycle consistent generative adversarial network (CycleGAN) and variational autoencoder (VAE) based models have gained popularity in non-parallel voice conversion recently. However, they often suffer from difficult training process and unsatisfactory results. In this paper, we propose CVC, a contrastive learning-based adversarial approach for voice conversion. Compared to previous CycleGAN-based methods, CVC only requires an efficient one-way GAN training by taking the advantage of contrastive learning. When it comes to non-parallel one-to-one voice conversion, CVC is on par or better than CycleGAN and VAE while effectively reducing training time. CVC further demonstrates superior performance in many-to-one voice conversion, enabling the conversion from unseen speakers.
翻译:在非平行语音转换中,基于周期一致的基因对抗网络(CycleGAN)和基于变式自动读数器(VAE)的模型最近受到非平行语音转换的欢迎,但是,这些模型往往受到困难的培训过程和不令人满意的结果的影响。在本文中,我们提议CVC,这是以对比性学习为基础的语音转换对抗性方法。与以前以循环GAN为基础的方法相比,CVC只要求利用对比性学习来进行有效的单向GAN培训。在非平行的一对一语音转换方面,CVC在有效减少培训时间的同时,与MyellGAN和VAE相同或更好。CVC进一步显示在多对一语音转换方面的优异性表现,能够从看不见的语音转换。