We use automatic speech recognition to assess spoken English learner pronunciation based on the authentic intelligibility of the learners' spoken responses determined from support vector machine (SVM) classifier or deep learning neural network model predictions of transcription correctness. Using numeric features produced by PocketSphinx alignment mode and many recognition passes searching for the substitution and deletion of each expected phoneme and insertion of unexpected phonemes in sequence, the SVM models achieve 82 percent agreement with the accuracy of Amazon Mechanical Turk crowdworker transcriptions, up from 75 percent reported by multiple independent researchers. Using such features with SVM classifier probability prediction models can help computer-aided pronunciation teaching (CAPT) systems provide intelligibility remediation.
翻译:我们使用自动语音识别来根据支持矢量机(SVM)分类器(SVM)或深学习神经网络模型预测转录正确性所决定的学习者口语的真知灼见性来评估英语学生发音。 使用PocketSphinx校正模式产生的数字特征和许多识别传票寻找替换和删除每个预期电话机并按顺序插入意外电话机,SVM模型在亚马逊机械土耳其人群工抄录的准确性方面达到了82%的一致,高于多个独立研究人员报告的75%。 使用SovetSphinx校正率预测模型的这些特征可以帮助计算机辅助读音教学系统提供智能修复。