Clustering is a fundamental task in unsupervised learning, one that targets to group a dataset into clusters of similar objects. There has been recent interest in embedding normative considerations around fairness within clustering formulations. In this paper, we propose 'local connectivity' as a crucial factor in assessing membership desert in centroid clustering. We use local connectivity to refer to the support offered by the local neighborhood of an object towards supporting its membership to the cluster in question. We motivate the need to consider local connectivity of objects in cluster assignment, and provide ways to quantify local connectivity in a given clustering. We then exploit concepts from density-based clustering and devise LOFKM, a clustering method that seeks to deepen local connectivity in clustering outputs, while staying within the framework of centroid clustering. Through an empirical evaluation over real-world datasets, we illustrate that LOFKM achieves notable improvements in local connectivity at reasonable costs to clustering quality, illustrating the effectiveness of the method.
翻译:集群是未经监督的学习中的一项基本任务,目标是将数据集归为类似对象的组群。最近人们有兴趣围绕集群配方的公平性纳入规范考虑。在本文件中,我们提议“本地连通性”是评估中小类组成员资格沙漠的一个关键因素。我们利用本地连通性来提及一个对象的当地邻居为支持该对象加入有关组群所提供的支持。我们提出需要考虑集群任务中对象的本地连通性,并提供在特定组群中量化本地连通的方法。我们然后利用基于密度的集群概念,并设计LOFKM,这是一种集群方法,力求深化集群产出的本地连通性,同时保持在类组内。我们通过对真实世界数据集的经验评估,说明LOFKM在集群质量上以合理成本实现本地连通性显著改善,说明了该方法的有效性。