As we rely on machine learning (ML) models to make more consequential decisions, the issue of ML models perpetuating or even exacerbating undesirable historical biases (e.g., gender and racial biases) has come to the fore of the public's attention. In this paper, we focus on the problem of detecting violations of individual fairness in ML models. We formalize the problem as measuring the susceptibility of ML models against a form of adversarial attack and develop a suite of inference tools for the adversarial cost function. The tools allow auditors to assess the individual fairness of ML models in a statistically-principled way: form confidence intervals for the worst-case performance differential between similar individuals and test hypotheses of model fairness with (asymptotic) non-coverage/Type I error rate control. We demonstrate the utility of our tools in a real-world case study.
翻译:由于我们依靠机器学习模式作出更具有影响的决定,因此人们已经注意到ML模式使不良历史偏见(例如性别和种族偏见)长期存在甚至加剧的问题,在本文件中,我们着重探讨在ML模式中发现侵犯个人公平的问题,将问题正式确定为衡量ML模式对某种形式的对抗性攻击的脆弱性,并为对抗性费用功能开发一套推论工具。这些工具使审计员能够以统计原则的方式评估ML模式的个人公平性:形成类似个人之间最坏情况性能差异的信任间隔,并测试模型公平性假设和(非补救性)非覆盖性/类型I错误率控制。我们在现实世界案例研究中展示了我们工具的效用。