Algorithmic fairness is becoming increasingly important in data mining and machine learning. Among others, a foundational notation is group fairness. The vast majority of the existing works on group fairness, with a few exceptions, primarily focus on debiasing with respect to a single sensitive attribute, despite the fact that the co-existence of multiple sensitive attributes (e.g., gender, race, marital status, etc.) in the real-world is commonplace. As such, methods that can ensure a fair learning outcome with respect to all sensitive attributes of concern simultaneously need to be developed. In this paper, we study the problem of information-theoretic intersectional fairness (InfoFair), where statistical parity, a representative group fairness measure, is guaranteed among demographic groups formed by multiple sensitive attributes of interest. We formulate it as a mutual information minimization problem and propose a generic end-to-end algorithmic framework to solve it. The key idea is to leverage a variational representation of mutual information, which considers the variational distribution between learning outcomes and sensitive attributes, as well as the density ratio between the variational and the original distributions. Our proposed framework is generalizable to many different settings, including other statistical notions of fairness, and could handle any type of learning task equipped with a gradient-based optimizer. Empirical evaluations in the fair classification task on three real-world datasets demonstrate that our proposed framework can effectively debias the classification results with minimal impact to the classification accuracy.
翻译:在数据挖掘和机器学习中,分析公平正在变得日益重要。在数据挖掘和机器学习中,除其他方面外,基本标记是群体公平。除了少数例外情况外,绝大多数现有关于群体公平的工作都以群体公平(InfoFair)问题为主,主要侧重于在单一敏感属性方面贬低偏见,尽管现实世界中多重敏感属性(如性别、种族、婚姻状况等)的共存是常见的,但现实世界中多重敏感属性(如性别、种族、婚姻状况等)的共存是常见的。因此,需要同时制定方法,确保在所有敏感关切的敏感属性方面取得公平的学习结果。在本文中,我们研究了信息-理论交叉公平(InfoFair)问题。统计平等是具有代表性的群体公平度的衡量标准,通过多种敏感属性组成的人口群体中,我们将统计平等性作为相互最小化的问题,并提出一个通用的端对端对端的算框架来解决这个问题。关键的想法是利用相互信息的变化性代表制,其中考虑到学习成果和敏感属性之间的差异性分布,以及差异性分类和原始分布之间的密度比例比例。我们提出的真正框架可以有效地处理各种统计格式的公平性评估。我们提议的框架在统计类型中,在统计上,可以对不同的统计上,可以对不同类型进行总体性评估。