Machine learning algorithms that aid human decision-making may inadvertently discriminate against certain protected groups. We formalize direct discrimination as a direct causal effect of the protected attributes on the decisions, while induced discrimination as a change in the causal influence of non-protected features associated with the protected attributes. The measurements of marginal direct effect (MDE) and SHapley Additive exPlanations (SHAP) reveal that state-of-the-art fair learning methods can induce discrimination via association or reverse discrimination in synthetic and real-world datasets. To inhibit discrimination in algorithmic systems, we propose to nullify the influence of the protected attribute on the output of the system, while preserving the influence of remaining features. We introduce and study post-processing methods achieving such objectives, finding that they yield relatively high model accuracy, prevent direct discrimination, and diminishes various disparity measures, e.g., demographic disparity.
翻译:有助于人类决策的机器学习算法可能无意中歧视某些受保护群体。我们正式确定直接歧视是受保护属性对决定的直接因果关系,同时作为与受保护属性有关的非受保护特征的因果关系的变化而引起歧视。对边际直接效应和Shanapley Additive Explansation的测量表明,最先进的公平学习方法可以通过在合成和真实世界数据集中的关联或反向歧视而引起歧视。为了抑制在算法系统中的歧视,我们提议取消受保护属性对系统产出的影响,同时保留剩余特征的影响。我们引入并研究实现上述目标的加工后方法,发现它们产生相对较高的模型准确性,防止直接歧视,并减少各种差异措施,例如人口差异。