There has been much interest recently in developing fair clustering algorithms that seek to do justice to the representation of groups defined along sensitive attributes such as race and gender. We observe that clustering algorithms could generate clusters such that different groups are disadvantaged within different clusters. We develop a clustering algorithm, building upon the centroid clustering paradigm pioneered by classical algorithms such as $k$-means, where we focus on mitigating the unfairness experienced by the most-disadvantaged group within each cluster. Our method uses an iterative optimisation paradigm whereby an initial cluster assignment is modified by reassigning objects to clusters such that the worst-off sensitive group within each cluster is benefitted. We demonstrate the effectiveness of our method through extensive empirical evaluations over a novel evaluation metric on real-world datasets. Specifically, we show that our method is effective in enhancing cluster-level group representativity fairness significantly at low impact on cluster coherence.
翻译:最近人们对制定公平的集群算法非常感兴趣,这种算法力求公正对待按种族和性别等敏感属性界定的群体的代表性。我们发现,集群算法可以产生集群,使不同群体在不同集群内处于不利地位。我们开发了一种集群算法,以传统算法(如美元)所开创的机器人集群模式为基础,我们侧重于减轻每个集群内最处境不利群体所经历的不公平。我们的方法使用一种迭代优化模式,通过将最初的集群分配的物体重新分配给集群,使每个集群内最坏的敏感群体受益。我们通过对现实世界数据集的新评价指标进行广泛的实证评估,展示了我们的方法的有效性。具体地说,我们证明我们的方法在提高集群一级群体的代表性公平性方面是有效的,对集群的一致性影响很小。