With the ubiquity of smart devices that use speaker recognition (SR) systems as a means of authenticating individuals and personalizing their services, fairness of SR systems has becomes an important point of focus. In this paper we study the notion of fairness in recent SR systems based on 3 popular and relevant definitions, namely Statistical Parity, Equalized Odds, and Equal Opportunity. We examine 5 popular neural architectures and 5 commonly used loss functions in training SR systems, while evaluating their fairness against gender and nationality groups. Our detailed experiments shed light on this concept and demonstrate that more sophisticated encoder architectures better align with the definitions of fairness. Additionally, we find that the choice of loss functions can significantly impact the bias of SR models.
翻译:由于聪明的装置无处不在,这些装置使用语音识别(SR)系统作为确认个人身份和使其服务个人化的手段,因此,SR系统的公平性已成为一个重要的重点,在本文件中,我们根据三个普遍和相关的定义,即统计均等、偶数和平等机会,研究最近SR系统中的公平性概念,我们研究了5个流行的神经结构和5个常见损失功能,同时评价了SR系统对性别和民族群体的公平性。我们的详细实验揭示了这一概念,并表明更先进的编码结构更符合公平性的定义。此外,我们发现损失功能的选择可以对SR模式的偏向产生重大影响。</s>