Computer-Assisted Pronunciation Training (CAPT) plays an important role in language learning. Conventional ASR-based CAPT methods require expensive annotation of the ground truth pronunciation for the supervised training. Meanwhile, certain undefined non-native phonemes cannot be correctly classified into standard phonemes, making the annotation process challenging and subjective. On the other hand, ASR-based CAPT methods only give the learner text-based feedback about the mispronunciation, but cannot teach the learner how to pronounce the sentence correctly. To solve these limitations, we propose to use the acoustic unit (AU) as the intermediary feature for both mispronunciation detection and correction. The proposed method uses the masked AU sequence and the target phonemes to detect the error AU and then corrects it. This method can give the learner speech-based self-imitating feedback, making our CAPT powerful for education.
翻译:计算机辅助读音培训(CAPT)在语言学习中起着重要作用。基于ASR的常规CAPT方法要求对受监督培训的地面真相发音进行昂贵的批注。同时,某些未定义的非本地电话无法被正确分类到标准电话中,使得批注过程具有挑战性和主观性。另一方面,基于ASR的CAPT方法只给予学习者关于发音错误的基于文本的反馈,但不能教教学习者如何正确朗读这句话。为了解决这些限制,我们提议使用音响单位(AU)作为检测和校正错误的中间特征。拟议方法使用蒙蔽的AU序列和目标电话来探测错误,然后纠正错误。这种方法可以让学习者以语言为基础的自我缩写反馈,使我们的CAPT在教育上变得强大。