We consider the problem of diversity enhancing clustering, i.e, developing clustering methods which produce clusters that favour diversity with respect to a set of protected attributes such as race, sex, age, etc. In the context of fair clustering, diversity plays a major role when fairness is understood as demographic parity. To promote diversity, we introduce perturbations to the distance in the unprotected attributes that account for protected attributes in a way that resembles attraction-repulsion of charged particles in Physics. These perturbations are defined through dissimilarities with a tractable interpretation. Cluster analysis based on attraction-repulsion dissimilarities penalizes homogeneity of the clusters with respect to the protected attributes and leads to an improvement in diversity. An advantage of our approach, which falls into a pre-processing set-up, is its compatibility with a wide variety of clustering methods and whit non-Euclidean data. We illustrate the use of our procedures with both synthetic and real data and provide discussion about the relation between diversity, fairness, and cluster structure. Our procedures are implemented in an R package freely available at https://github.com/HristoInouzhe/AttractionRepulsionClustering.
翻译:我们考虑多样性增强集群的问题,即制定集群方法,产生有利于种族、性别、年龄等一系列受保护属性多样性的集群。 在公平集群方面,多样性在将公平理解为人口均等方面起着重要作用。为了促进多样性,我们以类似于物理中电荷粒子的吸引-修复的方式,对作为受保护属性的受保护属性的属性的无保护属性的距离进行扰动。这些扰动是通过与可移动解释的不一致来界定的。基于吸引-报复性差异的集群分析,对各集群在受保护属性方面的同质性进行处罚,并导致多样性的改善。我们的方法的优势在于它与广泛的集群方法和液态非细胞数据兼容性。我们用合成和真实数据来说明我们的程序的使用,并就多样性、公平性和集群结构之间的关系展开讨论。我们的程序是在https://github.com/HristoInouzhe/Restracripliion-Retraritionalion.