Fairness aware data mining (FADM) aims to prevent algorithms from discriminating against protected groups. The literature has come to an impasse as to what constitutes explainable variability as opposed to discrimination. This distinction hinges on a rigorous understanding of the role of proxy variables; i.e., those variables which are associated both the protected feature and the outcome of interest. We demonstrate that fairness is achieved by ensuring impartiality with respect to sensitive characteristics and provide a framework for impartiality by accounting for different perspectives on the data generating process. In particular, fairness can only be precisely defined in a full-data scenario in which all covariates are observed. We then analyze how these models may be conservatively estimated via regression in partial-data settings. Decomposing the regression estimates provides insights into previously unexplored distinctions between explainable variability and discrimination that illuminate the use of proxy variables in fairness aware data mining.
翻译:了解公平的数据开采(FADM)的目的是防止算法歧视受保护群体,文献对于什么是可解释的可变性而不是歧视已经陷入僵局,这种区分取决于对代理变量作用的严格理解,即那些与受保护特征和利益结果相联系的变量。我们证明,公平是通过确保敏感特征的公正性来实现的,并通过考虑到数据生成过程的不同观点为公正提供一个框架。特别是,公平只能在一个完整数据假设中精确地界定,即所有变量都受到观察。我们然后分析这些模型如何通过部分数据设置的回归进行保守的估计。将回归估计分解为对先前未探讨过的可解释变量和歧视的区别,这些区别是说明在公平知情数据挖掘过程中使用替代变量的例证。