To provide more discriminative feedback for the second language (L2) learners to better identify their mispronunciation, we propose a method for exaggerated visual-speech feedback in computer-assisted pronunciation training (CAPT). The speech exaggeration is realized by an emphatic speech generation neural network based on Tacotron, while the visual exaggeration is accomplished by ADC Viseme Blending, namely increasing Amplitude of movement, extending the phone's Duration and enhancing the color Contrast. User studies show that exaggerated feedback outperforms non-exaggerated version on helping learners with pronunciation identification and pronunciation improvement.
翻译:为了向第二语言(L2)的学习者提供更具有歧视性的反馈,以更好地识别他们的读音错误,我们提出了一个在计算机辅助读音培训(CAPT)中夸大视觉语音反馈的方法。 语音夸大通过基于塔可坦的强烈语音生成神经网络来实现,而视觉夸大则由ADC Viseme Blinding完成,即提高运动的宽度、延长电话的长度和增强颜色对比度。 用户研究表明,夸大反馈在帮助学习者读音识别和读音改进方面,表现得超越了非夸张版本。