Traditionally, clustering algorithms focus on partitioning the data into groups of similar instances. The similarity objective, however, is not sufficient in applications where a fair-representation of the groups in terms of protected attributes like gender or race, is required for each cluster. Moreover, in many applications, to make the clusters useful for the end-user, a balanced cardinality among the clusters is required. Our motivation comes from the education domain where studies indicate that students might learn better in diverse student groups and of course groups of similar cardinality are more practical e.g., for group assignments. To this end, we introduce the fair-capacitated clustering problem that partitions the data into clusters of similar instances while ensuring cluster fairness and balancing cluster cardinalities. We propose a two-step solution to the problem: i) we rely on fairlets to generate minimal sets that satisfy the fair constraint and ii) we propose two approaches, namely hierarchical clustering and partitioning-based clustering, to obtain the fair-capacitated clustering. The hierarchical approach embeds the additional cardinality requirements during the merging step while the partitioning-based one alters the assignment step using a knapsack problem formulation to satisfy the additional requirements. Our experiments on four educational datasets show that our approaches deliver well-balanced clusters in terms of both fairness and cardinality while maintaining a good clustering quality.
翻译:传统上,集群算法侧重于将数据分成相似的情况组。不过,相似性的目标不足以满足每个组群在应用中要求各组群在性别或种族等受保护属性方面有公平的代表性;此外,在许多应用中,为使集群组群对最终用户有用,各组群之间需要有平衡的基点。我们的动机来自教育领域,研究显示,学生在不同的学生群体和相似的基点课程组群中可能学习得更好,例如,对于群体任务,这种类集法更切合实际。为此,我们引入公平的能力组群问题,将数据分成相似的情况组群,同时确保集群公平性和平衡集群的基点。我们提出了解决问题的两步解决办法:(一) 我们依靠公平性组群集来创造最起码的组合,满足公平的制约;(二) 我们提出两种办法,即分级组群群和基于分区的集群,以获得公平能力的组合。分级法在合并阶段中包含了额外的基点要求,而基于分层组群群群群群群化的一个步骤则用 knapsackackmack 来改变指派步骤,同时确保集群的公平性和平衡性,同时在基础组群群群集中显示我们的四项质量要求。