Machine earning algorithms are increasingly used for consequential decision making regarding individuals based on their relevant features. Features that are relevant for accurate decisions may however lead to either explicit or implicit forms of discrimination against unprivileged groups, such as those of certain race or gender. This happens due to existing biases in the training data, which are often replicated or even exacerbated by the learning algorithm. Identifying and measuring these biases at the data level is a challenging problem due to the interdependence among the features, and the decision outcome. In this work, we develop a framework for fairness-aware feature selection, based on information theoretic measures for the accuracy and discriminatory impacts of features. Specifically, our goal is to design a fairness utility score for each feature which quantifies how this feature influences accurate as well as nondiscriminatory decisions. We first propose information theoretic measures for the impact of different subsets of features on the accuracy and discrimination of the model. Subsequently, we deduce the marginal impact of each feature using Shapley value function. Our framework depends on the joint statistics of the data rather than a particular classifier design. We examine our proposed framework on real and synthetic data to evaluate its performance.
翻译:与准确决定有关的特征可能会导致对某些种族或性别等非特权群体以明示或隐含形式的歧视。这是由于培训数据中现有的偏见造成的,这些偏见往往被学习算法复制,甚至因学习算法而加剧。在数据一级查明和衡量这些偏见是一个具有挑战性的问题,因为特征和决定结果之间相互依存,因此在数据一级查明和衡量这些偏见是一个具有挑战性的问题。在这项工作中,我们根据关于特征准确性和歧视性影响的信息理论性衡量标准,制定了一个公平认识特征选择框架。具体地说,我们的目标是为每个特征设计一个公平效用评分,以量化这些特征如何影响准确性和非歧视性决定。我们首先就不同特征对模型准确性和歧视的影响提出衡量尺度。随后,我们用Shapley值函数推断每个特征的边际影响。我们的框架取决于数据的联合统计数据,而不是特定的分类设计。我们研究了我们关于真实和合成数据的拟议框架,以评价其绩效。