Automatic speech recognition (ASR) systems promise to deliver objective interpretation of human speech. Practice and recent evidence suggests that the state-of-the-art (SotA) ASRs struggle with the large variation in speech due to e.g., gender, age, speech impairment, race, and accents. Many factors can cause the bias of an ASR system. Our overarching goal is to uncover bias in ASR systems to work towards proactive bias mitigation in ASR. This paper is a first step towards this goal and systematically quantifies the bias of a Dutch SotA ASR system against gender, age, regional accents and non-native accents. Word error rates are compared, and an in-depth phoneme-level error analysis is conducted to understand where bias is occurring. We primarily focus on bias due to articulation differences in the dataset. Based on our findings, we suggest bias mitigation strategies for ASR development.
翻译:实践和最新证据表明,最新艺术(SotA)ASR系统与由于性别、年龄、语言障碍、种族和口音等原因导致的言语差异很大的情况作斗争。许多因素可能导致ASR系统的偏向。我们的首要目标是发现ASR系统中的偏见,争取在ASR系统中积极减少偏见。本文是实现这一目标的第一步,系统地量化荷兰SotA ASR系统对性别、年龄、区域口音和非本地口音的偏向。对字出错率进行了比较,并进行了深入的电话错误分析,以了解偏见的发生地点。我们主要侧重于由于表达数据集差异而产生的偏向。根据我们的调查结果,我们建议为ASR的发展制定减少偏见的战略。