This study investigates whether phonological features can be applied in text-to-speech systems to generate native and non-native speech. We present a mapping between ARPABET/pinyin->SAMPA/SAMPA-SC->phonological features in this paper, and tested whether native, non-native, and code-switched speech could be successfully generated using this mapping. We ran two experiments, one with a small dataset and one with a larger dataset. The results proved that phonological features can be a feasible input system, although it needs further investigation to improve model performance. The accented output generated by the TTS models also helps with understanding human second language acquisition processes.
翻译:这项研究调查了文字到语音系统是否可以应用声学特征来生成本地和非本地语言。我们在本文中展示了ARPABET/pinyin>SAMPA/SAMPA-SC->字学特征之间的映射图,并测试了使用这种映射能否成功生成本地语言、非本地语言和代码转换的语音。我们进行了两次实验,一次是小数据集,另一次是大数据集。结果证明声学特征可以是一个可行的输入系统,但还需要进一步调查才能改进模型性能。 TTS 模型的突出输出也有助于理解人类第二语言的获取过程。