Most Chinese Grapheme-to-Phoneme (G2P) systems employ a three-stage framework that first transforms input sequences into character embeddings, obtains linguistic information using language models, and then predicts the phonemes based on global context about the entire input sequence. However, linguistic knowledge alone is often inadequate. Language models frequently encode overly general structures of a sentence and fail to cover specific cases needed to use phonetic knowledge. Also, a handcrafted post-processing system is needed to address the problems relevant to the tone of the characters. However, the system exhibits inconsistency in the segmentation of word boundaries which consequently degrades the performance of the G2P system. To address these issues, we propose the Reinforcer that provides strong inductive bias for language models by emphasizing the phonological information between neighboring characters to help disambiguate pronunciations. Experimental results show that the Reinforcer boosts the cutting-edge architectures by a large margin. We also combine the Reinforcer with a large-scale pre-trained model and demonstrate the validity of using neighboring context in knowledge transfer scenarios.
翻译:多数中国石墨到极美(G2P)系统采用三阶段框架,首先将输入序列转换成字符嵌入,然后使用语言模型获得语言信息,然后根据全球背景预测整个输入序列的语音。然而,语言知识往往并不充分。语言模型经常对句子结构进行过于笼统的编码,而没有涵盖使用语音知识所需的具体案例。此外,还需要手工制作的后处理系统来解决与字符音调有关的问题。然而,该系统显示单词界限的分割不一致,从而降低G2P系统的性能。为解决这些问题,我们建议加强器,通过强调邻接字符之间的声学信息,为语言模型提供强烈的感应偏差,以帮助分离 prouncience。实验结果显示,“加强器”通过大幅度推进尖端结构。我们还将“增强器”与“强化器”与“预先训练前”模型结合起来,并展示在知识传输情景中使用“邻接环境”的有效性。</s>