Like face recognition, speaker recognition is widely used for voice-based biometric identification in a broad range of industries, including banking, education, recruitment, immigration, law enforcement, healthcare, and well-being. However, while dataset evaluations and audits have improved data practices in computer vision and face recognition, the data practices in speaker recognition have gone largely unquestioned. Our research aims to address this gap by exploring how dataset usage has evolved over time and what implications this has on bias and fairness in speaker recognition systems. Previous studies have demonstrated the presence of historical, representation, and measurement biases in popular speaker recognition benchmarks. In this paper, we present a longitudinal study of speaker recognition datasets used for training and evaluation from 2012 to 2021. We survey close to 700 papers to investigate community adoption of datasets and changes in usage over a crucial time period where speaker recognition approaches transitioned to the widespread adoption of deep neural networks. Our study identifies the most commonly used datasets in the field, examines their usage patterns, and assesses their attributes that affect bias, fairness, and other ethical concerns. Our findings suggest areas for further research on the ethics and fairness of speaker recognition technology.
翻译:像面部识别一样,说话人识别在广泛的行业中被广泛用于基于语音的生物识别,包括银行,教育,招聘,移民,执法,医疗保健和福利。然而,尽管数据集评估和审计已经改善了计算机视觉和面部识别的数据实践,但是说话人识别的数据实践很少受到质疑。我们的研究旨在通过探索数据集使用如何随时间演变以及这对说话人识别系统中的偏见和公平性产生了什么影响来解决这个差距。以往的研究已经证明了流行的说话人识别基准测试中存在历史性,代表性和测量偏差。在本文中,我们提出了一项关于从2012年到2021年用于训练和评估的说话人识别数据集的纵向研究。我们调查了近700篇论文,以研究数据集的社区采用情况以及随着说话人识别方法转向广泛采用深度神经网络的关键时期内的使用情况的变化。我们的研究确定了领域中最常用的数据集,检查了它们的使用模式,并评估了影响偏见,公平性和其他伦理关注的属性。我们的研究结果表明需要进一步研究说话人识别技术的伦理和公平性问题。