Automated speaker recognition uses data processing to identify speakers by their voice. Today, automated speaker recognition technologies are deployed on billions of smart devices and in services such as call centres. Despite their wide-scale deployment and known sources of bias in face recognition and natural language processing, bias in automated speaker recognition has not been studied systematically. We present an in-depth empirical and analytical study of bias in the machine learning development workflow of speaker verification, a voice biometric and core task in automated speaker recognition. Drawing on an established framework for understanding sources of harm in machine learning, we show that bias exists at every development stage in the well-known VoxCeleb Speaker Recognition Challenge, including model building, implementation, and data generation. Most affected are female speakers and non-US nationalities, who experience significant performance degradation. Leveraging the insights from our findings, we make practical recommendations for mitigating bias in automated speaker recognition, and outline future research directions.
翻译:今天,在数十亿智能装置和呼叫中心等服务中使用了自动语音识别技术。尽管这些技术在面部识别和自然语言处理方面得到了广泛的应用和已知的偏见来源,但是没有系统地研究自动语音识别的偏向。我们介绍了对发言者校验的机器学习发展工作流程中的偏向、语音生物鉴别和自动辨识发言者的核心任务的深入经验和分析研究。我们借助一个了解机器学习中伤害源的既定框架,表明在著名的VoxCeleb语音识别挑战的每个发展阶段都存在着偏见,包括模型建设、实施和数据生成。受影响最大的是女性发言者和非美国国籍,他们的工作表现严重退化。我们从我们的调查结果中汲取了深刻的见解,我们提出了减少自动识别语音中的偏向的实用建议,并概述了未来的研究方向。