In recent years, there has been a flurry of research focusing on the fairness of machine learning models, and in particular on quantifying and eliminating bias against protected subgroups. One line of work generalizes the notion of protected subgroups beyond simple discrete classes by introducing the notion of a "rich subgroup", and seeks to train models that are calibrated or equalize error rates with respect to these richer subgroup classes. Largely orthogonally, local model explanation methods have been developed that given a classifier h and test point x, attribute influence for the prediction h(x) to the individual features of x. This raises a natural question: Do local model explanation methods attribute different feature importance values on average across different protected subgroups, and can we detect these disparities efficiently? If the model places high weight on a given feature in a specific protected subgroup, but not on the dataset overall (or vice versa), this could be a potential indicator of bias in the predictive model or the underlying data generating process, and is at the very least a useful diagnostic that signals the need for a domain expert to delve deeper. In this paper, we formally introduce the notion of feature importance disparity (FID) in the context of rich subgroups, design oracle-efficent algorithms to identify large FID subgroups, and conduct a thorough empirical analysis that establishes auditing for FID as an important method to investigate dataset bias. Our experiments show that across 4 datasets and 4 common feature importance methods our algorithms find (feature, subgroup) pairs that simultaneously: (i) have subgroup feature importance that is often an order of magnitude different than the importance on the dataset as a whole (ii) generalize out of sample, and (iii) yield interesting discussions about potential bias inherent in these datasets.
翻译:近些年来,人们不断研究机器学习模型的公正性,特别是量化和消除对受保护分组的偏差。一行工作通过引入“ 丰富分组” 的概念,将受保护分组的概念概括为简单离散类以外的受保护分组的概念,并试图对经校准或平衡这些较富裕分组类别误差率的模型进行培训。在很大程度上,已经开发出具有分类 h 和测试点 x 的本地示范解释方法,将预测h(x) 的影响力归属于x 的个体特征。这引起了一个自然问题: 本地模型解释方法将不同受保护分组的平均值赋予不同特征的重要性,我们能否有效地检测这些差异? 如果模型高度重视某个特定受保护分组的某个特定特征,而不是总体数据集(反之反),这可能是预测模型或基本数据生成过程的偏差的潜在指标,而且最起码的诊断表明需要域专家更深入地分析。 在本文中,我们正式引入了不同特征的特征评估重要性概念概念,即: 深度数据分析的深度分析(FID) 总体而言, 深度数据分析的深度分析是整个基础(IFID) 和整个数据分组, 显示我们的数据分析的深度分析的深度分析的深度分析(IFID) 的深度分析的深度分析的深度分析,其基础,其基础,其深度分析的深度分析的深度分析,其深度分析的深度分析,其基础的深度分析,其基础分析,其基础,其基础分析,其基础,其基础,其基础,其基础分析,其基础,其基础分析,其基础分析,其基础分析,其基础分析,其基础分析,其基础分析,其基础分析,其基础是第4级分析,其基础分析,其基础分析,其基础分析,其基础分析,其基础分析,其基础分析,其基础分析,其基础分析,其基础分析,其基础分析,其基础分析,其基础分析,其基础分析,其基础分析,其基础分析,其基础分析,其基础分析,其基础分析,其基础分析,其基础分析,其基础分析,其基础分析,其基础分析,其基础分析,其基础分析,其基础分析,其基础分析,其基础分析,其基础分析,其基础分析,其基础分析,其基础分析,其基础分析,其基础分析,其基础分析,其基础分析,其基础</s>