基于性别成分的公平本本构成部分分析新颖办法 (A novel approach for Fair Principal Component Analysis based on eigendecomposition)

Principal component analysis (PCA), a ubiquitous dimensionality reduction technique in signal processing, searches for a projection matrix that minimizes the mean squared error between the reduced dataset and the original one. Since classical PCA is not tailored to address concerns related to fairness, its application to actual problems may lead to disparity in the reconstruction errors of different groups (e.g., men and women, whites and blacks, etc.), with potentially harmful consequences such as the introduction of bias towards sensitive groups. Although several fair versions of PCA have been proposed recently, there still remains a fundamental gap in the search for algorithms that are simple enough to be deployed in real systems. To address this, we propose a novel PCA algorithm which tackles fairness issues by means of a simple strategy comprising a one-dimensional search which exploits the closed-form solution of PCA. As attested by numerical experiments, the proposal can significantly improve fairness with a very small loss in the overall reconstruction error and without resorting to complex optimization schemes. Moreover, our findings are consistent in several real situations as well as in scenarios with both unbalanced and balanced datasets.

翻译：主要组成部分分析(PCA)是信号处理中无处不在的减少维度技术(PCA),在信号处理中寻找一个预测矩阵,最大限度地减少减少减少的数据集与原始数据集之间的平均平方差错。由于古典的CPA不是专门为处理与公平有关的关切问题而设计的,因此对实际问题的应用可能导致不同群体(如男女、白人和黑人等)重建错误的差异,产生潜在的有害后果,例如对敏感群体采取偏见。虽然最近提出了多种公平的CPA版本,但在寻找能够被实际系统应用到的简单算法方面仍然存在根本差距。为了解决这个问题,我们建议采用新的CPA算法,通过简单战略解决公平问题,包括利用CPC的封闭式解决办法进行单维搜索。正如数字实验所证明的那样,该提案可以大大改善公平性,在整个重建错误中损失很小,而不必采用复杂的优化计划。此外,我们的调查结果在若干真实情况下,以及在有不平衡和平衡的假设中是一致的。

相关内容

PCA

关注 3

在统计中，主成分分析（PCA）是一种通过最大化每个维度的方差来将较高维度空间中的数据投影到较低维度空间中的方法。给定二维，三维或更高维空间中的点集合，可以将“最佳拟合”线定义为最小化从点到线的平均平方距离的线。可以从垂直于第一条直线的方向类似地选择下一条最佳拟合线。重复此过程会产生一个正交的基础，其中数据的不同单个维度是不相关的。这些基向量称为主成分。

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日