There are synergies of research interests and industrial efforts in modeling fairness and correcting algorithmic bias in machine learning. In this paper, we present a scalable algorithm for spectral clustering (SC) with group fairness constraints. Group fairness is also known as statistical parity where in each cluster, each protected group is represented with the same proportion as in the entirety. While FairSC algorithm (Kleindessner et al., 2019) is able to find the fairer clustering, it is compromised by high costs due to the kernels of computing nullspaces and the square roots of dense matrices explicitly. We present a new formulation of underlying spectral computation by incorporating nullspace projection and Hotelling's deflation such that the resulting algorithm, called s-FairSC, only involves the sparse matrix-vector products and is able to fully exploit the sparsity of the fair SC model. The experimental results on the modified stochastic block model demonstrate that s-FairSC is comparable with FairSC in recovering fair clustering. Meanwhile, it is sped up by a factor of 12 for moderate model sizes. s-FairSC is further demonstrated to be scalable in the sense that the computational costs of s-FairSC only increase marginally compared to the SC without fairness constraints.
翻译:在模拟公平和纠正机器学习中的算法偏差方面,存在着研究利益和工业努力的协同效应,在模拟公平和纠正机器学习中的公平性和纠正算法偏差方面,存在着研究利益和工业努力的协同效应。在本文中,我们提出了一个带有群体公平限制的光谱聚集(SC)可缩放算法。集团公平也被称为统计均等,在每个组群中,每个受保护的组群代表的比例与整个组群相同。虽然FairSC算法(Kleindessner等人,2019年)能够找到更公平的组群,但由于计算空格的内核和密集基质的根基底基底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底基底底底底底底底底底底底底底底底底底底底底底底底底底底底基底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底基底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底底,我们,我们底底底底底底底底底底,我们底底底底底底底