Chinese dialect text-to-speech(TTS) system usually can only be utilized by native linguists, because the written form of Chinese dialects has different characters, idioms, grammar and usage from Mandarin, and even the local speaker cannot input a correct sentence. For Mandarin text inputs, Chinese dialect TTS can only generate partly-meaningful speech with relatively poor prosody and naturalness. To lower the bar of use and make it more practical in commercial, we propose a novel Chinese dialect TTS frontend with a translation module. It helps to convert Mandarin text into idiomatic expressions with correct orthography and grammar, so that the intelligibility and naturalness of the synthesized speech can be improved. A non-autoregressive neural machine translation model with a glancing sampling strategy is proposed for the translation task. It is the first known work to incorporate translation with TTS frontend. Our experiments on Cantonese approve that the proposed frontend can help Cantonese TTS system achieve a 0.27 improvement in MOS with Mandarin inputs.
翻译:中文方言文字翻译系统通常只能由本地语言学家使用, 因为中文方言的书面形式有不同的字符、 语、 语、 语、 语、 语、 当地语无法输入正确句子 。 对于普通话文本输入, 中国方言TTT 只能产生部分有意义的语言, 其手势和自然性相对较差。 要降低使用限制, 使其在商业上更加实用, 我们建议使用中国方言 TTS 前端配一个翻译模块。 它有助于将普通话文字转换成有正正方言和语的单词表达方式, 从而能够改善合成语的智能和自然性。 为翻译工作提议了一个非外向式神经机器翻译模型, 并采用带带宽度采样策略。 这是第一个已知的将翻译与 TTS 前端合并的工作。 我们在广东语实验中认可, 拟议的 TTTS 系统可以帮助广东语 TTS 系统在曼达林投入中实现0.27 的MOS 改进 。