As facial recognition systems are deployed more widely, scholars and activists have studied their biases and harms. Audits are commonly used to accomplish this and compare the algorithmic facial recognition systems' performance against datasets with various metadata labels about the subjects of the images. Seminal works have found discrepancies in performance by gender expression, age, perceived race, skin type, etc. These studies and audits often examine algorithms which fall into two categories: academic models or commercial models. We present a detailed comparison between academic and commercial face detection systems, specifically examining robustness to noise. We find that state-of-the-art academic face detection models exhibit demographic disparities in their noise robustness, specifically by having statistically significant decreased performance on older individuals and those who present their gender in a masculine manner. When we compare the size of these disparities to that of commercial models, we conclude that commercial models - in contrast to their relatively larger development budget and industry-level fairness commitments - are always as biased or more biased than an academic model.
翻译:随着面部识别系统的部署更加广泛,学者和活动家研究了他们的偏见和伤害。审计通常用来实现这一点,并将算法面识别系统的业绩与数据集与图像主题的各种元数据标签进行比较。半成品工作发现,性别表现、年龄、种族、皮肤类型等的绩效存在差异。这些研究和审计经常审查分为两类的算法:学术模型或商业模型。我们详细比较了学术和商业面部检测系统,具体研究了对噪音的稳健度。我们发现,最先进的学术面部检测模型在噪音稳健性方面显示出人口差异,特别是通过统计上显著降低老年人和以男性方式展示其性别的人的绩效。当我们将这些差异的规模与商业模型相比时,我们的结论是,商业模型与其相对较大的发展预算和工业级公平承诺相比,总是带有偏见,或者比学术模型更具有偏见。