Correlation clustering is a ubiquitous paradigm in unsupervised machine learning where addressing unfairness is a major challenge. Motivated by this, we study Fair Correlation Clustering where the data points may belong to different protected groups and the goal is to ensure fair representation of all groups across clusters. Our paper significantly generalizes and improves on the quality guarantees of previous work of Ahmadi et al. and Ahmadian et al. as follows. - We allow the user to specify an arbitrary upper bound on the representation of each group in a cluster. - Our algorithm allows individuals to have multiple protected features and ensure fairness simultaneously across them all. - We prove guarantees for clustering quality and fairness in this general setting. Furthermore, this improves on the results for the special cases studied in previous work. Our experiments on real-world data demonstrate that our clustering quality compared to the optimal solution is much better than what our theoretical result suggests.
翻译:在未受监督的机器学习中,处理不公平现象是一个重大挑战,因此,关联集群是一种无处不在的范例。我们为此研究公平关联集群,因为数据点可能属于不同的受保护群体,目标是确保所有群体在组群中的公平代表性。我们的文件对艾哈迈迪等人和艾哈迈德等人以及艾哈迈迪等人以往工作的质量保障作了如下的大幅度概括和改进。我们允许用户对每个群体在组群中的代表性指定一个任意的上限。 - 我们的算法允许个人拥有多个受保护的特征,并确保它们之间的公平性。 - 我们证明在这个总体环境中,可以保证组合质量和公平性。 此外,这也改进了以往工作中所研究的特例的结果。 我们对现实世界数据的实验表明,与最佳解决方案相比,我们的组合质量比理论结果所显示的要好得多。