Conversion of Chinese Grapheme-to-Phoneme (G2P) plays an important role in Mandarin Chinese Text-To-Speech (TTS) systems, where one of the biggest challenges is the task of polyphone disambiguation. Most of the previous polyphone disambiguation models are trained on manually annotated datasets, and publicly available datasets for polyphone disambiguation are scarce. In this paper we propose a simple back-translation-style data augmentation method for mandarin Chinese polyphone disambiguation, utilizing a large amount of unlabeled text data. Inspired by the back-translation technique proposed in the field of machine translation, we build a Grapheme-to-Phoneme (G2P) model to predict the pronunciation of polyphonic character, and a Phoneme-to-Grapheme (P2G) model to predict pronunciation into text. Meanwhile, a window-based matching strategy and a multi-model scoring strategy are proposed to judge the correctness of the pseudo-label. We design a data balance strategy to improve the accuracy of some typical polyphonic characters in the training set with imbalanced distribution or data scarcity. The experimental result shows the effectiveness of the proposed back-translation-style data augmentation method.
翻译:中国石墨转换成Phoneme (G2P) 在中文文字变音系统(TTS) 中扮演了重要角色, 其中最大的挑战之一是多语调脱线任务。 先前的多语种脱线模型大多在手动附加说明数据集方面受过培训, 用于多语种脱线的公开数据元件很少。 在本文中, 我们建议了一种简单的中语复译式脱网的中语复译式数据增强方法, 使用大量未贴标签的文本数据。 在机器翻译领域提议的回译技术的启发下, 我们建立了一个“ 古语变换- Phoneme (G2P) ” 模型, 以预测多语种字元字符的发音, 以及一个PMeme- 至Graphem (P2G) 模型, 以预测文字中的发音。 同时, 我们提出了一种基于窗口的匹配策略和多模式的评分数战略, 以判断伪标签的正确性。 我们设计了一个数据平衡战略, 以便用典型数据递增度的公式分析方法改进某些数据递增制结果。