A common approach to the automatic detection of mispronunciation works by recognizing the phonemes produced by a student and comparing it to the expected pronunciation of a native speaker. This approach makes two simplifying assumptions: a) phonemes can be recognized from speech with high accuracy, b) there is a single correct way for a sentence to be pronounced. These assumptions do not always hold which can result in a significant amount of false mispronunciation alarms. We propose a novel approach to overcome this problem based on two principles: a) taking into account uncertainty in the automatic phoneme recognition step, b) accounting for the fact that there may be multiple valid pronunciations. We evaluate the model on non-native (L2) English speech of German, Italian and Polish speakers, where it is shown to increase the precision of detecting mispronunciations by up to 18\% (relative) compared to the common approach.
翻译:通过承认学生制作的语音,并将其与预期的本地语发音进行比较,对自动检测读音错误作品采取共同的方法。这种方法提出了两种简化的假设:(a) 能够从高精度的语音中识别电话;(b) 发音有单一正确的方式。这些假设并不总是能够产生大量错误发音的警报。我们建议根据两个原则采取新的方法来解决这一问题:(a) 考虑到自动电话识别步骤的不确定性,(b) 考虑可能有多重有效发音的事实。我们评估非母语(L2)德语、意大利语和波兰语英语的模型,其中显示与通用方法相比,发现错误发音的精确度最高达18 ⁇ (相对性)。