We revisit the problem of fair clustering, first introduced by Chierichetti et al., that requires each protected attribute to have approximately equal representation in every cluster; i.e., a balance property. Existing solutions to fair clustering are either not scalable or do not achieve an optimal trade-off between clustering objective and fairness. In this paper, we propose a new notion of fairness, which we call $tau$-fair fairness, that strictly generalizes the balance property and enables a fine-grained efficiency vs. fairness trade-off. Furthermore, we show that simple greedy round-robin based algorithms achieve this trade-off efficiently. Under a more general setting of multi-valued protected attributes, we rigorously analyze the theoretical properties of the our algorithms. Our experimental results suggest that the proposed solution outperforms all the state-of-the-art algorithms and works exceptionally well even for a large number of clusters.
翻译:我们审视了公平分组问题,首先由Chierichetti等人提出,公平分组要求每个受保护的属性在每个组群中拥有大致平等的代表权,即平衡财产。公平分组的现有解决方案要么无法伸缩,要么无法在组合目标和公平之间实现最佳的权衡。在本文中,我们提出了一个新的公平概念,我们称之为美元公平,严格地概括平衡财产,并能够实现微调效率相对于公平权衡。此外,我们展示了简单贪婪的圆柱式算法能够有效地实现这一平衡。在多价值保护属性的更普遍环境下,我们严格分析了我们算法的理论属性。我们的实验结果表明,拟议解决方案超越了所有最先进的算法,甚至对大量组群也非常有效。