In budget-constrained settings aimed at mitigating unfairness, like law enforcement, it is essential to prioritize the sources of unfairness before taking measures to mitigate them in the real world. Unlike previous works, which only serve as a caution against possible discrimination and de-bias data after data generation, this work provides a toolkit to mitigate unfairness during data generation, given by the Unfair Edge Prioritization algorithm, in addition to de-biasing data after generation, given by the Discrimination Removal algorithm. We assume that a non-parametric Markovian causal model representative of the data generation procedure is given. The edges emanating from the sensitive nodes in the causal graph, such as race, are assumed to be the sources of unfairness. We first quantify Edge Flow in any edge X -> Y, which is the belief of observing a specific value of Y due to the influence of a specific value of X along X -> Y. We then quantify Edge Unfairness by formulating a non-parametric model in terms of edge flows. We then prove that cumulative unfairness towards sensitive groups in a decision, like race in a bail decision, is non-existent when edge unfairness is absent. We prove this result for the non-trivial non-parametric model setting when the cumulative unfairness cannot be expressed in terms of edge unfairness. We then measure the Potential to mitigate the Cumulative Unfairness when edge unfairness is decreased. Based on these measurements, we propose the Unfair Edge Prioritization algorithm that can then be used by policymakers. We also propose the Discrimination Removal Procedure that de-biases a data distribution by eliminating optimization constraints that grow exponentially in the number of sensitive attributes and values taken by them. Extensive experiments validate the theorem and specifications used for quantifying the above measures.
翻译:与以往的工作不同,以往的工作只是提醒人们不要在数据生成后可能出现歧视和减少数据偏差。 这项工作提供了一个工具包,可以减少数据生成过程中的不公平,这是不公平的优先排序算法提供的,除了产生后减少偏差的数据之外,还有歧视去除算法提供的产生后减少偏差的数据。 我们假定,在数据生成程序之前,有一个非对称的马尔多夫因果模型代表了数据生成程序。 源自因果图表中敏感节点(如种族)的决策者的优势,被假定为不公平的根源。 我们首先将任何边缘X - > Y 的Edge流动量化,认为由于X - > Y 的特定价值的影响,在生成时可以遵守Y的具体价值。 然后,我们用歧视去除偏差模型中的偏差值来量化爱不公不公。 然后,我们证明,在做出决策时,像利利的种族一样,对敏感群体的累积不公不公不公。 当使用不公不公不公不公不公不公的汇率时,我们用不公不公不公不公的汇率衡量不公。