Algorithmic fairness has aroused considerable interests in data mining and machine learning communities recently. So far the existing research has been mostly focusing on the development of quantitative metrics to measure algorithm disparities across different protected groups, and approaches for adjusting the algorithm output to reduce such disparities. In this paper, we propose to study the problem of identification of the source of model disparities. Unlike existing interpretation methods which typically learn feature importance, we consider the causal relationships among feature variables and propose a novel framework to decompose the disparity into the sum of contributions from fairness-aware causal paths, which are paths linking the sensitive attribute and the final predictions, on the graph. We also consider the scenario when the directions on certain edges within those paths cannot be determined. Our framework is also model agnostic and applicable to a variety of quantitative disparity measures. Empirical evaluations on both synthetic and real-world data sets are provided to show that our method can provide precise and comprehensive explanations to the model disparities.
翻译:最近,在数据挖掘和机器学习社区中,公平性引起了相当大的兴趣,到目前为止,现有研究主要侧重于制定量化指标,以衡量不同受保护群体之间的算法差异,以及调整算法产出以缩小这种差异的方法。在本文件中,我们提议研究确定模型差异来源的问题。与通常了解特征重要性的现有解释方法不同,我们考虑特性变量之间的因果关系,并提出一个新的框架,将差异分解成公平认知因果路径的贡献总和,这些路径是连接敏感属性和最终预测的路径。我们还考虑了这些路径中某些边缘方向无法确定的情景。我们的框架还模拟了不可知性和适用于各种定量差异计量措施。对合成和真实世界数据集的实证性评价都表明,我们的方法可以准确和全面地解释模型差异。