Singing voice synthesis (SVS), as a specific task for generating the vocal singing voice from a music score, has drawn much attention in recent years. SVS faces the challenge that the singing has various pronunciation flexibility conditioned on the same music score. Most of the previous works of SVS can not well handle the misalignment between the music score and actual singing. In this paper, we propose an acoustic feature processing strategy, named PHONEix, with a phoneme distribution predictor, to alleviate the gap between the music score and the singing voice, which can be easily adopted in different SVS systems. Extensive experiments in various settings demonstrate the effectiveness of our PHONEix in both objective and subjective evaluations.
翻译:歌曲合成(SVS)作为从音乐得分中产生发声声音的具体任务,近年来引起了人们的极大注意。 SVS面临一个挑战,即歌声具有不同的发音灵活性,取决于同一个音乐得分。SVS以前的大部分作品都无法很好地处理音乐得分和实际唱歌之间的错配。在本文中,我们提出了一个声音特征处理策略,名为Phoneix,配有电话配送预测器,以缩小音乐得分与歌声之间的差距,这可以很容易地在不同的SVS系统中被采用。 各种环境下的广泛实验都表明我们的Phoneix在客观和主观评价方面的有效性。</s>