Biometric recognition is used across a variety of applications from cyber security to border security. Recent research has focused on ensuring biometric performance (false negatives and false positives) is fair across demographic groups. While there has been significant progress on the development of metrics, the evaluation of the performance across groups, and the mitigation of any problems, there has been little work incorporating statistical variation. This is important because differences among groups can be found by chance when no difference is present. In statistics this is called a Type I error. Differences among groups may be due to sampling variation or they may be due to actual difference in system performance. Discriminating between these two sources of error is essential for good decision making about fairness and equity. This paper presents two novel statistical approaches for assessing fairness across demographic groups. The first methodology is a bootstrapped-based hypothesis test, while the second is simpler test methodology focused upon non-statistical audience. For the latter we present the results of a simulation study about the relationship between the margin of error and factors such as number of subjects, number of attempts, correlation between attempts, underlying false non-match rates(FNMR's), and number of groups.
翻译:从网络安全到边境安全的各种应用都采用了生物识别方法。最近的研究侧重于确保生物鉴别性能(假阴性和假正数)在人口群体之间是公平的。虽然在制订衡量标准、评估跨群体业绩和减轻任何问题方面取得了显著进展,但几乎没有纳入统计差异的工作。这很重要,因为当不存在差异时,群体间的差异是偶然发现的。在统计中,这被称为类型I错误。各群体间的差异可能是由于抽样差异或系统性能的实际差异所致。区分这两个错误来源对于就公平和公平问题作出良好决策至关重要。本文件介绍了两种新的统计方法,用以评估各人口群体之间的公平性。第一种方法是基于布置的假设测试,而第二种方法是侧重于非统计受众的更简单测试方法。对于后者,我们介绍了关于误差幅度与诸如主题数量、尝试次数、尝试之间的相关性、误差非匹配率基础(NIMMRs)和群体数目等因素之间的关系的模拟研究的结果。