Feature selection is a prevalent data preprocessing paradigm for various learning tasks. Due to the expensive cost of acquiring supervision information, unsupervised feature selection sparks great interests recently. However, existing unsupervised feature selection algorithms do not have fairness considerations and suffer from a high risk of amplifying discrimination by selecting features that are over associated with protected attributes such as gender, race, and ethnicity. In this paper, we make an initial investigation of the fairness-aware unsupervised feature selection problem and develop a principled framework, which leverages kernel alignment to find a subset of high-quality features that can best preserve the information in the original feature space while being minimally correlated with protected attributes. Specifically, different from the mainstream in-processing debiasing methods, our proposed framework can be regarded as a model-agnostic debiasing strategy that eliminates biases and discrimination before downstream learning algorithms are involved. Experimental results on multiple real-world datasets demonstrate that our framework achieves a good trade-off between utility maximization and fairness promotion.
翻译:由于获取监督信息的成本昂贵,未经监督的特征选择最近引起了极大的兴趣。然而,现有的未经监督的特征选择算法并不具有公平考虑,而且极有可能通过选择与性别、种族和族裔等受保护属性过于相关的特征来扩大歧视。在本文件中,我们初步调查了公平认知而不受监督的特征选择问题,并制定了原则框架,利用内核调整找到一组高质量特征,这些特征在原始特征空间中能够最好地保存信息,同时与受保护属性关系最小。具体地说,我们提议的框架不同于处理中主流的减少偏见方法,可被视为一种在涉及下游学习算法之前消除偏见和歧视的示范-不偏差战略。多个真实世界数据集的实验结果表明,我们的框架在利用最大化和公平促进之间实现了良好的平衡。