There are synergies of research interests and industrial efforts in modeling fairness and correcting algorithmic bias in machine learning. In this paper, we present a scalable algorithm for spectral clustering (SC) with group fairness constraints. Group fairness is also known as statistical parity where in each cluster, each protected group is represented with the same proportion as in the entirety. While FairSC algorithm (Kleindessner et al., 2019) is able to find the fairer clustering, it is compromised by high costs due to the kernels of computing nullspaces and the square roots of dense matrices explicitly. We present a new formulation of underlying spectral computation by incorporating nullspace projection and Hotelling's deflation such that the resulting algorithm, called s-FairSC, only involves the sparse matrix-vector products and is able to fully exploit the sparsity of the fair SC model. The experimental results on the modified stochastic block model demonstrate that s-FairSC is comparable with FairSC in recovering fair clustering. Meanwhile, it is sped up by a factor of 12 for moderate model sizes. s-FairSC is further demonstrated to be scalable in the sense that the computational costs of s-FairSC only increase marginally compared to the SC without fairness constraints.
翻译:研究机器学习中建模公平性和纠正算法偏差的研究兴趣和工业努力存在协同作用。在本文中,我们提出了一种具有群组公平性约束的可扩展谱聚类算法。群组公平性也被称为统计公正性,其中在每个聚类中,每个受保护的群体以与整体相同的比例表示。虽然FairSC算法(Kleindessner等人,2019)能够找到更公平的聚类,但由于计算零空间和显式计算稠密矩阵的平方根的核心而成本高。我们提出了基础谱计算的新表述,通过结合零空间投影和Hotelling的缩放,从而得到新的算法,称为s-FairSC,它只涉及稀疏矩阵-向量乘积,并能充分利用公平SC模型的稀疏性。修改后的随机块模型上的实验结果表明,s-FairSC在恢复公平聚类方面与FairSC相当。同时,对中等模型大小进行加速12倍。进一步证明了s-FairSC在计算成本方面具有可扩展性,与不带公平性约束的SC相比,s-FairSC的计算成本只会略微增加。