Automatic speech recognition (ASR) systems promise to deliver objective interpretation of human speech. Practice and recent evidence suggests that the state-of-the-art (SotA) ASRs struggle with speech variance due to gender, age, speech impairment, race, and accents. Many factors can cause the bias of an ASR system, e.g. composition of the training material and articulation differences. Our overarching goal is to uncover bias in ASR systems to work towards proactive bias mitigation in ASR. This paper systematically quantifies the bias of a SotA ASR system against gender, age, regional accents and non-native accents. Word error rates are compared, and in-depth phoneme-level error analysis is conducted to understand where bias is occurring. We focus on bias due to articulation differences in the dataset. Based on our findings, we suggest bias mitigation strategies for ASR development.
翻译:实践和最近的证据表明,最先进的ASR(SotA)系统与性别、年龄、语言障碍、种族和口音造成的语言差异作斗争。许多因素可能导致ASR系统的偏向,例如培训材料的构成和表达差异。我们的首要目标是发现ASR系统中的偏见,争取在ASR中积极减少偏见。本文系统地量化了SotA ASR系统对性别、年龄、区域口音和非本地口音的偏向。对字出错率进行了比较,并进行了深入的电话错误分析,以了解偏见的发生地点。我们注重因在数据集中表达差异而产生的偏向。我们根据调查结果,建议为ASR的发展制定减少偏见的战略。