Fairness in machine learning has attained significant focus due to the widespread application in high-stake decision-making tasks. Unregulated machine learning classifiers can exhibit bias towards certain demographic groups in data, thus the quantification and mitigation of classifier bias is a central concern in fairness in machine learning. In this paper, we aim to quantify the influence of different features in a dataset on the bias of a classifier. To do this, we introduce the Fairness Influence Function (FIF). This function breaks down bias into its components among individual features and the intersection of multiple features. The key idea is to represent existing group fairness metrics as the difference of the scaled conditional variances in the classifier's prediction and apply a decomposition of variance according to global sensitivity analysis. To estimate FIFs, we instantiate an algorithm FairXplainer that applies variance decomposition of classifier's prediction following local regression. Experiments demonstrate that FairXplainer captures FIFs of individual feature and intersectional features, provides a better approximation of bias based on FIFs, demonstrates higher correlation of FIFs with fairness interventions, and detects changes in bias due to fairness affirmative/punitive actions in the classifier.
翻译:机器学习的公平性由于广泛应用高级决策任务而获得了显著的焦点; 不受管制的机器学习分类者在数据中可能对某些人口群体表现出偏向,因此,分类偏向的量化和减少是机器学习公平性的一个中心问题; 在本文件中,我们的目标是量化一个数据集中不同特征对分类者偏向性的影响。 为此,我们引入了公平性影响函数(FIF) 。这一功能将各特性和多重特征之间的偏见分解为不同组成部分。 关键思想是代表现有群体公平度指标,作为分类预测中按比例设定的有条件差异的差异,并根据全球敏感性分析对差异进行分解。 为了估算FIF, 我们即时设定一个算法FairXplainer,在本地回归后对分类者预测进行差异分解。 实验表明,FairXplainer捕捉到单个特征和交叉特征的FIFIF, 提供了基于FIF的更好的偏差近, 显示FIFF与公平性干预的更贴近, 并检测由于公平性/分级行动而导致的偏向性变化。