Ensuring fairness in machine learning algorithms is a challenging and essential task. We consider the problem of clustering a set of points while satisfying fairness constraints. While there have been several attempts to capture group fairness in the $k$-clustering problem, fairness at an individual level is relatively less explored. We introduce a new notion of individual fairness in $k$-clustering based on features not necessarily used for clustering. We show that this problem is NP-hard and does not admit a constant factor approximation. Therefore, we design a randomized algorithm that guarantees approximation both in terms of minimizing the clustering distance objective and individual fairness under natural restrictions on the distance metric and fairness constraints. Finally, our experimental results against six competing baselines validate that our algorithm produces individually fairer clusters than the fairest baseline by 12.5% on average while also being less costly in terms of the clustering objective than the best baseline by 34.5% on average.
翻译:确保机器学习算法的公平性是一项具有挑战性和至关重要的任务。 我们认为,在满足公平性限制的同时,将一组点组合起来的问题是一个具有挑战性的重要任务。 虽然在集中问题中曾几次试图抓住群体公平性,但相对而言,个人一级的公平性探索较少。 我们根据不一定用于集中的特征,在集中方面引入了新的个人公平性概念。 我们表明,这一问题是硬性NP,并不接受一个不变的系数近似值。 因此,我们设计了一种随机化算法,它既保证在尽可能减少集中的距离目标,又保证个人公平性,同时在自然限制的距离度量度和公平性限制下。 最后,我们对六个相互竞争的基线的实验结果证实,我们的算法的单个组合比最公平的基线平均高出12.5%,而在平均而言,组合目标的成本比最佳基线低34.5%。