In the application of data clustering to human-centric decision-making systems, such as loan applications and advertisement recommendations, the clustering outcome might discriminate against people across different demographic groups, leading to unfairness. A natural conflict occurs between the cost of clustering (in terms of distance to cluster centers) and the balance representation of all demographic groups across the clusters, leading to a bi-objective optimization problem that is nonconvex and nonsmooth. To determine the complete trade-off between these two competing goals, we design a novel stochastic alternating balance fair $k$-means (SAfairKM) algorithm, which consists of alternating classical mini-batch $k$-means updates and group swap updates. The number of $k$-means updates and the number of swap updates essentially parameterize the weight put on optimizing each objective function. Our numerical experiments show that the proposed SAfairKM algorithm is robust and computationally efficient in constructing well-spread and high-quality Pareto fronts both on synthetic and real datasets.
翻译:在对以人为中心的决策系统(如贷款申请和广告建议)应用数据集群时,集群结果可能会对不同人口群体的人造成歧视,从而导致不公平;集群成本(从离集群中心的距离来看)与集群所有人口群体的均衡代表性之间自然发生冲突,导致两个目标的优化问题,即非混凝土和非混凝土问题;为了确定这两个相互竞争的目标之间的完全权衡,我们设计了一个新型的随机交替平衡公平美元汇率(SAfairKM)算法(SAfairKM),其中包括交替的经典微型批量美元汇率更新和群体互换更新。美元汇率更新数和互换更新数基本上是将优化每项目标功能的权重参数参数化。我们的数字实验表明,拟议的SAfairKM算法在合成和真实数据集上构建广泛和高质量的Pareto前列法是稳健且计算有效的。