Deep learning based singing voice synthesis (SVS) systems have been demonstrated to flexibly generate singing with better qualities, compared to conventional statistical parametric based methods. However, neural systems are generally data-hungry and have difficulty to reach reasonable singing quality with limited public available training data. In this work, we explore different data augmentation methods to boost the training of SVS systems, including several strategies customized to SVS based on pitch augmentation and mix-up augmentation. To further stabilize the training, we introduce the cycle-consistent training strategy. Extensive experiments on two public singing databases demonstrate that our proposed augmentation methods and the stabilizing training strategy can significantly improve the performance on both objective and subjective evaluations.
翻译:与传统的统计参数基础方法相比,深层学习的歌声合成(SVS)系统被证明能够灵活地产生质量更好的歌唱,但神经系统一般都是数据饥饿,难以以有限的公共培训数据达到合理的歌唱质量。在这项工作中,我们探索了不同的数据增强方法,以促进对SVS系统的培训,包括一些基于投放扩增和混和扩增的适合SVS的战略。为了进一步稳定培训,我们引入了循环一致的培训战略。对两个公共歌唱数据库的广泛实验表明,我们拟议的扩增方法和稳定培训战略可以大大改善客观和主观评价的绩效。