Speech synthesis for poetry is challenging due to specific intonation patterns inherent to poetic speech. In this work, we propose an approach to synthesise poems with almost human like naturalness in order to enable literary scholars to systematically examine hypotheses on the interplay between text, spoken realisation, and the listener's perception of poems. To meet these special requirements for literary studies, we resynthesise poems by cloning prosodic values from a human reference recitation, and afterwards make use of fine-grained prosody control to manipulate the synthetic speech in a human-in-the-loop setting to alter the recitation w.r.t. specific phenomena. We find that finetuning our TTS model on poetry captures poetic intonation patterns to a large extent which is beneficial for prosody cloning and manipulation and verify the success of our approach both in an objective evaluation as well as in human studies.
翻译:诗歌的语音合成由于诗歌诗歌本身固有的具体化模式而具有挑战性。 在这项工作中,我们提出一种方法,将诗歌合成成几乎像自然一样的人性,以便使文学学者能够系统地研究关于文字、口头实现和听众对诗歌感知之间相互作用的假设。 为了满足文学研究的这些特殊要求,我们用人类参考引言中的克隆先质价值合成诗歌,然后利用细微的先行控制来操纵人间圈套环境中的合成演讲,以改变回音(r.t.)特定现象。我们发现,我们的TTS模型在诗歌中捕捉诗意的诗意式模式在很大程度上有利于假克隆和操纵,并在客观评估和人类研究中验证我们方法的成功。