Discrimination in machine learning often arises along multiple dimensions (a.k.a. protected attributes); it is then desirable to ensure \emph{intersectional fairness} -- i.e., that no subgroup is discriminated against. It is known that ensuring \emph{marginal fairness} for every dimension independently is not sufficient in general. Due to the exponential number of subgroups, however, directly measuring intersectional fairness from data is impossible. In this paper, our primary goal is to understand in detail the relationship between marginal and intersectional fairness through statistical analysis. We first identify a set of sufficient conditions under which an exact relationship can be obtained. Then, we prove bounds (easily computable through marginal fairness and other meaningful statistical quantities) in high-probability on intersectional fairness in the general case. Beyond their descriptive value, we show that these theoretical bounds can be leveraged to derive a heuristic improving the approximation and bounds of intersectional fairness by choosing, in a relevant manner, protected attributes for which we describe intersectional subgroups. Finally, we test the performance of our approximations and bounds on real and synthetic data-sets.
翻译:机器学习中的歧视往往产生于多个层面(a.k.a. 受保护的属性);然后,最好确保\ emph{ 跨区公平} -- -- 即没有任何分组受到歧视。众所周知,确保每个层面独立独立地确保\ emph{ 边际公平} 并不完全充分。然而,由于子群的指数数之多,直接测量数据之间的交叉公平是不可能的。在本文件中,我们的主要目标是通过统计分析,详细了解边际和交叉公平之间的关系。我们首先确定一套足够条件,以便在这些条件下取得确切的关系。然后,我们证明(通过边际公平和其他有意义的统计数量很容易比较)在一般情况下具有高度的交叉公平性。除了其描述价值外,我们表明这些理论界限可以被用来通过以相关方式选择我们描述交叉分组的受保护属性来获得超常化的近似和交叉公平界限。最后,我们测试了我们关于真实和合成数据设置的近似和界限的性。