不受监督的功能选择 (Fairness-Aware Unsupervised Feature Selection)

Feature selection is a prevalent data preprocessing paradigm for various learning tasks. Due to the expensive cost of acquiring supervision information, unsupervised feature selection sparks great interests recently. However, existing unsupervised feature selection algorithms do not have fairness considerations and suffer from a high risk of amplifying discrimination by selecting features that are over associated with protected attributes such as gender, race, and ethnicity. In this paper, we make an initial investigation of the fairness-aware unsupervised feature selection problem and develop a principled framework, which leverages kernel alignment to find a subset of high-quality features that can best preserve the information in the original feature space while being minimally correlated with protected attributes. Specifically, different from the mainstream in-processing debiasing methods, our proposed framework can be regarded as a model-agnostic debiasing strategy that eliminates biases and discrimination before downstream learning algorithms are involved. Experimental results on multiple real-world datasets demonstrate that our framework achieves a good trade-off between utility maximization and fairness promotion.

翻译：由于获取监督信息的成本昂贵,未经监督的特征选择最近引起了极大的兴趣。然而,现有的未经监督的特征选择算法并不具有公平考虑,而且极有可能通过选择与性别、种族和族裔等受保护属性过于相关的特征来扩大歧视。在本文件中,我们初步调查了公平认知而不受监督的特征选择问题,并制定了原则框架,利用内核调整找到一组高质量特征,这些特征在原始特征空间中能够最好地保存信息,同时与受保护属性关系最小。具体地说,我们提议的框架不同于处理中主流的减少偏见方法,可被视为一种在涉及下游学习算法之前消除偏见和歧视的示范-不偏差战略。多个真实世界数据集的实验结果表明,我们的框架在利用最大化和公平促进之间实现了良好的平衡。

相关内容

特征选择

关注 5931

特征选择( Feature Selection )也称特征子集选择( Feature Subset Selection , FSS )，或属性选择( Attribute Selection )。是指从已有的M个特征(Feature)中选择N个特征使得系统的特定指标最优化，是从原始特征中选择出一些最有效特征以降低数据集维度的过程,是提高学习算法性能的一个重要手段,也是模式识别中关键的数据预处理步骤。对于一个学习算法来说,好的学习样本是训练模型的关键。

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【卡内基梅隆大学-CMU】机器学习中的公平性，Learning Fair Representations

专知会员服务

38+阅读 · 2020年2月29日

【AAAI2020论文】无监督归属多路网络嵌入， Unsupervised Attributed Multiplex Network Embedding (附pdf)

专知会员服务

39+阅读 · 2019年11月19日