The increasing application of Artificial Intelligence and Machine Learning models poses potential risks of unfair behavior and, in light of recent regulations, has attracted the attention of the research community. Several researchers focused on seeking new fairness definitions or developing approaches to identify biased predictions. However, none try to exploit the counterfactual space to this aim. In that direction, the methodology proposed in this work aims to unveil unfair model behaviors using counterfactual reasoning in the case of fairness under unawareness setting. A counterfactual version of equal opportunity named counterfactual fair opportunity is defined and two novel metrics that analyze the sensitive information of counterfactual samples are introduced. Experimental results on three different datasets show the efficacy of our methodologies and our metrics, disclosing the unfair behavior of classic machine learning and debiasing models.
翻译:越来越多的人造情报和机器学习模式的应用带来了不公平行为的潜在风险,并且根据最近的规定,吸引了研究界的注意。一些研究人员侧重于寻求新的公平定义或制定方法,以确定有偏向的预测。然而,没有人试图利用反事实空间实现这一目标。在这方面,这项工作中建议的方法旨在揭露不公平的模型行为,在不知情情况下利用反事实推理来证明公正性。界定了称为反事实公平机会的平等机会的反事实版本,并引入了两个新的指标,分析反事实抽样的敏感信息。三个不同的数据集的实验结果显示了我们的方法和指标的有效性,揭示了经典机器学习和贬低模型的不公平行为。