Fairness evaluation in face analysis systems (FAS) typically depends on automatic demographic attribute inference (DAI), which itself relies on predefined demographic segmentation. However, the validity of fairness auditing hinges on the reliability of the DAI process. We begin by providing a theoretical motivation for this dependency, showing that improved DAI reliability leads to less biased and lower-variance estimates of FAS fairness. To address this, we propose a fully reproducible DAI pipeline that replaces conventional end-to-end training with a modular transfer learning approach. Our design integrates pretrained face recognition encoders with non-linear classification heads. We audit this pipeline across three dimensions: accuracy, fairness, and a newly introduced notion of robustness, defined via intra-identity consistency. The proposed robustness metric is applicable to any demographic segmentation scheme. We benchmark the pipeline on gender and ethnicity inference across multiple datasets and training setups. Our results show that the proposed method outperforms strong baselines, particularly on ethnicity, which is the more challenging attribute. To promote transparency and reproducibility, we will publicly release the training dataset metadata, full codebase, pretrained models, and evaluation toolkit. This work contributes a reliable foundation for demographic inference in fairness auditing.
翻译:人脸分析系统(FAS)的公平性评估通常依赖于自动人口统计属性推断(DAI),而DAI本身又依赖于预定义的人口统计划分。然而,公平性审计的有效性取决于DAI过程的可靠性。我们首先从理论上论证了这种依赖性,表明提高DAI的可靠性可以降低FAS公平性估计的偏差和方差。为此,我们提出了一种完全可复现的DAI流程,该流程采用模块化迁移学习方法替代传统的端到端训练。我们的设计将预训练的人脸识别编码器与非线性分类头相结合。我们从三个维度对该流程进行审计:准确性、公平性以及新引入的鲁棒性概念——后者通过身份内部一致性来定义。所提出的鲁棒性指标适用于任何人口统计划分方案。我们在多个数据集和训练设置下,对性别和种族推断任务进行了基准测试。结果表明,所提方法优于强基线模型,尤其在更具挑战性的种族属性推断上表现更佳。为促进透明度和可复现性,我们将公开训练数据集元数据、完整代码库、预训练模型和评估工具包。本工作为公平性审计中的人口统计推断提供了可靠的基础。