Spectral clustering is popular among practitioners and theoreticians alike. While performance guarantees for spectral clustering are well understood, recent studies have focused on enforcing ``fairness'' in clusters, requiring them to be ``balanced'' with respect to a categorical sensitive node attribute (e.g. the race distribution in clusters must match the race distribution in the population). In this paper, we consider a setting where sensitive attributes indirectly manifest in an auxiliary \textit{representation graph} rather than being directly observed. This graph specifies node pairs that can represent each other with respect to sensitive attributes and is observed in addition to the usual \textit{similarity graph}. Our goal is to find clusters in the similarity graph while respecting a new individual-level fairness constraint encoded by the representation graph. We develop variants of unnormalized and normalized spectral clustering for this task and analyze their performance under a \emph{fair} planted partition model induced by the representation graph. This model uses both the cluster membership of the nodes and the structure of the representation graph to generate random similarity graphs. To the best of our knowledge, these are the first consistency results for constrained spectral clustering under an individual-level fairness constraint. Numerical results corroborate our theoretical findings.
翻译:在实践者和理论学家中,光谱群集十分受欢迎。 虽然光谱群集的性能保障得到了很好的理解, 最近的研究侧重于在群集中执行“公平性”的“功能保障 ”, 要求它们在绝对敏感节点属性(例如,群集中的种族分布必须与人口种族分布相匹配) 。 在本文中, 我们考虑一个环境, 敏感属性间接表现在辅助 \ textit{ 代表图中, 而不是直接观察。 此图指定了在敏感属性方面可以相互代表的节点配对, 并观察到通常的\ textit{ 相似性图表 。 我们的目标是在类似图形中找到群, 同时尊重由代表图编码的新的个人级别公平性约束。 我们为这项任务开发了非正常和正常的光谱组合变量, 并在 代表图中引入的种植分流模型, 使用节点的组组成和代表图的结构来生成随机的相似性图表 。 我们的目标是, 在最可靠的光谱层中, 这些是我们了解的单个分析结果 。