Automated speaker recognition uses data processing to identify speakers by their voice. Today, automated speaker recognition is deployed on billions of smart devices and in services such as call centres. Despite their wide-scale deployment and known sources of bias in related domains like face recognition and natural language processing, bias in automated speaker recognition has not been studied systematically. We present an in-depth empirical and analytical study of bias in the machine learning development workflow of speaker verification, a voice biometric and core task in automated speaker recognition. Drawing on an established framework for understanding sources of harm in machine learning, we show that bias exists at every development stage in the well-known VoxCeleb Speaker Recognition Challenge, including data generation, model building, and implementation. Most affected are female speakers and non-US nationalities, who experience significant performance degradation. Leveraging the insights from our findings, we make practical recommendations for mitigating bias in automated speaker recognition, and outline future research directions.
翻译:今天,在数十亿智能装置和呼叫中心等服务中都安装了自动语音识别,尽管这些装置的部署范围广,而且已知的偏见来源在诸如面部识别和自然语言处理等相关领域,但自动语音识别的偏向没有系统地研究自动语音识别的偏向,我们对语音验证的机器学习发展工作流程中的偏向、语音生物鉴别和自动语音识别的核心任务进行了深入的经验和分析研究。我们借助一个了解机器学习中伤害源的既定框架,表明在著名的VoxCeleb语音识别挑战的每个发展阶段都存在着偏见,包括数据生成、模型建设和执行。受影响最大的是女性发言者和非美国国籍的人,他们的工作表现严重退化。我们从我们的调查结果中汲取了深刻的见解,我们提出了减少自动语音识别中的偏向的实用建议,并概述了未来的研究方向。