Even if deployed with the best intentions, machine learning methods can perpetuate, amplify or even create social biases. Measures of (un-)fairness have been proposed as a way to gauge the (non-)discriminatory nature of machine learning models. However, proxies of protected attributes causing discriminatory effects remain challenging to address. In this work, we propose a new algorithmic approach that measures group-wise demographic parity violations and allows us to inspect the causes of inter-group discrimination. Our method relies on the novel idea of measuring the dependence of a model on the protected attribute based on the explanation space, an informative space that allows for more sensitive audits than the primary space of input data or prediction distributions, and allowing for the assertion of theoretical demographic parity auditing guarantees. We provide a mathematical analysis, synthetic examples, and experimental evaluation of real-world data. We release an open-source Python package with methods, routines, and tutorials.
翻译:即使以最佳意图部署,机器学习方法也可以使社会偏见永久化、扩大甚至产生; 提出了(非)公平措施,作为衡量机器学习模式的(非)歧视性质的一种方法; 然而,具有歧视性影响的受保护属性的替代物仍然难以解决; 在这项工作中,我们提出了一种新的算法方法,以衡量群体间人口均等的违反行为,并使我们能够检查群体间歧视的原因; 我们的方法依靠一种新颖的想法,即根据解释空间衡量模型对受保护属性的依赖性; 一种信息空间,允许进行比输入数据或预测分布的主要空间更敏感的审计,并允许主张理论人口均等审计保证; 我们提供了数学分析、合成实例和对现实世界数据进行实验性评估。 我们用方法、常规和教义发布一个开放源的Python软件包。</s>