Automatic pronunciation assessment is an important technology to help self-directed language learners. While pronunciation quality has multiple aspects including accuracy, fluency, completeness, and prosody, previous efforts typically only model one aspect (e.g., accuracy) at one granularity (e.g., at the phoneme-level). In this work, we explore modeling multi-aspect pronunciation assessment at multiple granularities. Specifically, we train a Goodness Of Pronunciation feature-based Transformer (GOPT) with multi-task learning. Experiments show that GOPT achieves the best results on speechocean762 with a public automatic speech recognition (ASR) acoustic model trained on Librispeech.
翻译:自动发音评估是帮助自导语言学习者的重要技术。虽然发音质量具有多个方面,包括准确性、流利度、完整性和流体作用,但以往的努力通常只有一个颗粒(例如电话层的精度)的一个方面(例如准确性)模式。在这项工作中,我们探索多颗粒多发性多发性发音评估模式。具体地说,我们用多任务学习来培训基于发音特征的变异器(GOPT ) 。 实验显示,GOPT通过对Librispeech 进行公开自动语音识别(ASR) 声学模型培训,在762号语音上取得了最佳效果。