We present counterfactual situation testing (CST), a causal data mining framework for detecting discrimination in classifiers. CST aims to answer in an actionable and meaningful way the intuitive question "what would have been the model outcome had the individual, or complainant, been of a different protected status?" It extends the legally-grounded situation testing of Thanh et al. (2011) by operationalizing the notion of fairness given the difference using counterfactual reasoning. For any complainant, we find and compare similar protected and non-protected instances in the dataset used by the classifier to construct a control and test group, where a difference between the decision outcomes of the two groups implies potential individual discrimination. Unlike situation testing, which builds both groups around the complainant, we build the test group on the complainant's counterfactual generated using causal knowledge. The counterfactual is intended to reflect how the protected attribute when changed affects the seemingly neutral attributes used by the classifier, which is taken for granted in many frameworks for discrimination. Under CST, we compare similar individuals within each group but dissimilar individuals across both groups due to the possible difference between the complainant and its counterfactual. Evaluating our framework on two classification scenarios, we show that it uncovers a greater number of cases than situation testing, even when the classifier satisfies the counterfactual fairness condition of Kusner et al. (2017).
翻译:我们提出了反事实情况测试(CST),这是在分类者中发现歧视的因果关系数据挖掘框架。科技委的目的是以一种可操作和有意义的方式回答“如果个人或投诉人具有不同的受保护地位,那么模型结果会是什么?”这一直观问题。 它扩展了Thanh等人(2011年)的基于法律的情况测试,通过使用反事实推理来应用基于差异的公平概念,从而扩大了Thanh等人(2011年)的基于法律的情况测试。对于任何投诉人来说,我们发现并比较了分类者用来构建一个控制和测试组的数据集中类似的受保护和非受保护的类似实例。在这两个组中,两个组的决定结果之间的差异意味着潜在的个人歧视。与在申诉人周围建立两个组的情况测试不同,我们用因果关系知识在申诉人的反事实上构建了测试组。反事实测试旨在反映在变换时受保护的属性如何影响分类者使用的看似中立的属性,在许多歧视框架中被视为是理所当然的。在科技委之下,我们比较了每个组内的类似个人,但两个组的相异个人,因为两个组之间可能存在不同的个人歧视。在申诉人与其反事实情况之间有差异。我们评估了两个框架时,在判断性测试时,我们检验了两种情况时,而不是评估了更真实性测试。