In human speech, the attitude of a speaker cannot be fully expressed only by the textual content. It has to come along with the intonation. Declarative questions are commonly used in daily Cantonese conversations, and they are usually uttered with rising intonation. Vanilla neural text-to-speech (TTS) systems are not capable of synthesizing rising intonation for these sentences due to the loss of semantic information. Though it has become more common to complement the systems with extra language models, their performance in modeling rising intonation is not well studied. In this paper, we propose to complement the Cantonese TTS model with a BERT-based statement/question classifier. We design different training strategies and compare their performance. We conduct our experiments on a Cantonese corpus named CanTTS. Empirical results show that the separate training approach obtains the best generalization performance and feasibility.
翻译:在人类的演讲中,发言人的态度不能仅以文字内容来充分表达,这必须与潮流相配合。在日常广东谈话中经常使用声明性的问题,这些问题通常会随着内向的上升而发声。香草神经文字对语音(TTS)系统由于语言信息丢失,无法合成这些句子的上升。虽然以额外的语言模型补充这些系统已变得越来越常见,但它们在形成民族方面的模型性能却没有得到很好的研究。在本文件中,我们提议用基于BERT的声明/问题分类器来补充广东TTS模型。我们设计不同的培训战略并比较它们的表现。我们在一个称为CANTTS的广东天体上进行实验。经验性的结果显示,不同的培训方法获得了最佳的通用性能和可行性。