Incorporating fairness constructs into machine learning algorithms is a topic of much societal importance and recent interest. Clustering, a fundamental task in unsupervised learning that manifests across a number of web data scenarios, has also been subject of attention within fair ML research. In this paper, we develop a novel notion of fairness in clustering, called representativity fairness. Representativity fairness is motivated by the need to alleviate disparity across objects' proximity to their assigned cluster representatives, to aid fairer decision making. We illustrate the importance of representativity fairness in real-world decision making scenarios involving clustering and provide ways of quantifying objects' representativity and fairness over it. We develop a new clustering formulation, RFKM, that targets to optimize for representativity fairness along with clustering quality. Inspired by the $K$-Means framework, RFKM incorporates novel loss terms to formulate an objective function. The RFKM objective and optimization approach guides it towards clustering configurations that yield higher representativity fairness. Through an empirical evaluation over a variety of public datasets, we establish the effectiveness of our method. We illustrate that we are able to significantly improve representativity fairness at only marginal impact to clustering quality.
翻译:将公平概念纳入机器学习算法是一个具有重大社会重要性和近期兴趣的主题。 分组是一系列网络数据假设中显示的不受监督学习的基本任务,也是公平 ML研究中关注的一项基本任务。 在本文中,我们发展了一种新型的集群公平概念,称为代表性公平。 代表性公平是因为需要减少不同对象之间距离其指定组别代表的距离,以帮助更公平的决策。 我们说明了代表性公平在现实世界涉及集群的决策情景中的重要性,并提供了量化物体代表性和公平性的方法。 我们制定了一个新的集群方案,即RFKM,目标是在集群质量的同时优化代表性公平性。受$K-Means框架的启发,RFKM将新的损失术语纳入一个客观功能。 RFKM的目标和优化方法指导它实现集群配置,从而产生更高的代表性公平性。 我们通过对各种公共数据集进行实证性评估,确定了我们的方法的有效性。 我们指出,我们能够大幅提高代表性的公平性,仅能提高边际组合的质量。