A variety of fairness constraints have been proposed in the literature to mitigate group-level statistical bias. Their impacts have been largely evaluated for different groups of populations corresponding to a set of sensitive attributes, such as race or gender. Nonetheless, the community has not observed sufficient explorations for how imposing fairness constraints fare at an instance level. Building on the concept of influence function, a measure that characterizes the impact of a training example on the target model and its predictive performance, this work studies the influence of training examples when fairness constraints are imposed. We find out that under certain assumptions, the influence function with respect to fairness constraints can be decomposed into a kernelized combination of training examples. One promising application of the proposed fairness influence function is to identify suspicious training examples that may cause model discrimination by ranking their influence scores. We demonstrate with extensive experiments that training on a subset of weighty data examples leads to lower fairness violations with a trade-off of accuracy.
翻译:文献中提出了各种公平限制,以缓解群体一级的统计偏见,其影响在很大程度上是针对种族或性别等一系列敏感属性的不同人口群体的评价,然而,社区没有看到对如何在实例一级施加公平限制进行充分探讨。根据影响力功能的概念,这是对目标模式培训范例的影响及其预测性表现的一种特征,这项工作研究了在实行公平限制时培训实例的影响。我们发现,在某些假设下,对公平限制的影响功能可以分解成一个分解的组合培训实例。拟议的公平影响功能的一个有希望的应用是,通过对影响分数进行排名,找出可能造成典型歧视的可疑培训实例。我们通过广泛的实验证明,对一组重度数据实例的培训导致以偏差和准确性交换。