This paper defines fair principal component analysis (PCA) as minimizing the maximum mean discrepancy (MMD) between dimensionality-reduced conditional distributions of different protected classes. The incorporation of MMD naturally leads to an exact and tractable mathematical formulation of fairness with good statistical properties. We formulate the problem of fair PCA subject to MMD constraints as a non-convex optimization over the Stiefel manifold and solve it using the Riemannian Exact Penalty Method with Smoothing (REPMS; Liu and Boumal, 2019). Importantly, we provide local optimality guarantees and explicitly show the theoretical effect of each hyperparameter in practical settings, extending previous results. Experimental comparisons based on synthetic and UCI datasets show that our approach outperforms prior work in explained variance, fairness, and runtime.
翻译:本文将公平主要成分分析定义为最大限度地缩小不同受保护类别不同维度减低有条件分布之间的最大平均差异(MMD),纳入MMD自然导致精确和可移植的数学公式,具有良好的统计属性的公平性。我们将受MD制约的公平的五氯苯甲醚问题表述为对Stiefel 元件的非混凝土优化,并使用里曼尼异形惩罚法(REPMS;Liu和Boumal,2019年)加以解决。重要的是,我们提供地方最佳性保证,并明确显示每个超参数在实际环境中的理论效果,扩展以往的结果。基于合成和UCI数据集的实验性比较表明,我们的方法在解释差异、公平性和运行时间方面优于以往的工作。