森林公平关联集群 (Fair Correlation Clustering in Forests)

The study of algorithmic fairness received growing attention recently. This stems from the awareness that bias in the input data for machine learning systems may result in discriminatory outputs. For clustering tasks, one of the most central notions of fairness is the formalization by Chierichetti, Kumar, Lattanzi, and Vassilvitskii [NeurIPS 2017]. A clustering is said to be fair, if each cluster has the same distribution of manifestations of a sensitive attribute as the whole input set. This is motivated by various applications where the objects to be clustered have sensitive attributes that should not be over- or underrepresented. We discuss the applicability of this fairness notion to Correlation Clustering. The existing literature on the resulting Fair Correlation Clustering problem either presents approximation algorithms with poor approximation guarantees or severely limits the possible distributions of the sensitive attribute (often only two manifestations with a 1:1 ratio are considered). Our goal is to understand if there is hope for better results in between these two extremes. To this end, we consider restricted graph classes which allow us to characterize the distributions of sensitive attributes for which this form of fairness is tractable from a complexity point of view. While existing work on Fair Correlation Clustering gives approximation algorithms, we focus on exact solutions and investigate whether there are efficiently solvable instances. The unfair version of Correlation Clustering is trivial on forests, but adding fairness creates a surprisingly rich picture of complexities. We give an overview of the distributions and types of forests where Fair Correlation Clustering turns from tractable to intractable. The most surprising insight to us is the fact that the cause of the hardness of Fair Correlation Clustering is not the strictness of the fairness condition.

翻译：算法公平研究最近受到越来越多的关注。这来自对机器学习系统输入数据的偏差可能会导致歧视性产出的认识。对于分类任务, 最核心的公平概念之一是Chierichetti、 Kumar、 Lattanzi 和 Vassilvitskii [NeurIPS 2017] 的正规化。如果每个组群的敏感属性表现与整个输入组的分布相同, 则该组群据说是公平的。这是由各种应用程序驱动的, 其中要组合的物体具有不应过高或过低的敏感直观属性。我们讨论公平概念是否适用于Corrulation 群集。由此产生的公平关联组合问题的现有文献要么显示近似算算法,要么严重限制敏感属性的可能分布( 通常只考虑1:1比率的两种表现 ) 。我们的目标是了解这两个极端组群落之间是否有更好的希望。为此, 我们考虑有限的图表类别, 使我们能够描述这种直观特征的分布, 其形式从一个复杂度角度看, 直观的直观性来自一个复杂度, 直观的直观的直观的直观, 直观的直观的直观的直观, 使得我们的直观的直观的直观的直观的直观的直观的直观的直观的直观直观的直观直观直观的直观的直观的直观的直观的直观的直观的直观是, 直观的直观。