Although voice conversion (VC) algorithms have achieved remarkable success along with the development of machine learning, superior performance is still difficult to achieve when using nonparallel data. In this paper, we propose using a cycle-consistent adversarial network (CycleGAN) for nonparallel data-based VC training. A CycleGAN is a generative adversarial network (GAN) originally developed for unpaired image-to-image translation. A subjective evaluation of inter-gender conversion demonstrated that the proposed method significantly outperformed a method based on the Merlin open source neural network speech synthesis system (a parallel VC system adapted for our setup) and a GAN-based parallel VC system. This is the first research to show that the performance of a nonparallel VC method can exceed that of state-of-the-art parallel VC methods.
翻译:虽然语音转换算法在机器学习发展的同时取得了显著成功,但在使用非平行数据时,仍然难以取得优异的性能。在本文中,我们提议使用循环一致的对抗网络(CycleGAN)进行无平行数据VC培训。一个循环GAN是一个基因化对抗网络(GAN),最初是为不支持图像到图像翻译而开发的。对性别间转换的主观评价表明,拟议方法大大超过基于Merlin开放源神经网络话语合成系统(为我们的设置而调整的平行VC系统)和基于GAN的平行VC系统的方法。这是第一次显示非平行VC方法的性能可以超过最新水平的VC方法的研究。