使用信息理论对总体差异的特征贡献进行量化 (Quantifying Feature Contributions to Overall Disparity Using Information Theory)

When a machine-learning algorithm makes biased decisions, it can be helpful to understand the sources of disparity to explain why the bias exists. Towards this, we examine the problem of quantifying the contribution of each individual feature to the observed disparity. If we have access to the decision-making model, one potential approach (inspired from intervention-based approaches in explainability literature) is to vary each individual feature (while keeping the others fixed) and use the resulting change in disparity to quantify its contribution. However, we may not have access to the model or be able to test/audit its outputs for individually varying features. Furthermore, the decision may not always be a deterministic function of the input features (e.g., with human-in-the-loop). For these situations, we might need to explain contributions using purely distributional (i.e., observational) techniques, rather than interventional. We ask the question: what is the "potential" contribution of each individual feature to the observed disparity in the decisions when the exact decision-making mechanism is not accessible? We first provide canonical examples (thought experiments) that help illustrate the difference between distributional and interventional approaches to explaining contributions, and when either is better suited. When unable to intervene on the inputs, we quantify the "redundant" statistical dependency about the protected attribute that is present in both the final decision and an individual feature, by leveraging a body of work in information theory called Partial Information Decomposition. We also perform a simple case study to show how this technique could be applied to quantify contributions.

翻译：当机器学习算法作出有偏向的决定时,了解差异的来源是有好处的,以解释为什么存在偏差的原因。为此,我们研究量化每个特性对观察到的差异的贡献问题。如果我们能够使用决策模式,一种潜在办法(受基于干预的方法的启发,在解释性文献中)是改变每个特性(同时保持其他特性),并利用由此产生的差异变化来量化其贡献。然而,我们可能无法使用模型,或能够测试/审计其产出,以了解个别不同的特性。此外,这一决定不一定总能成为投入特征(例如人与人之间)的决定性功能。对于这些情况,我们可能需要使用纯粹的分布(即观察性)方法而不是干预性的方法来解释贡献。我们问道:当准确的决策机制无法获取时,我们每个特性对所观察到的简单差异的“潜在”贡献是什么?我们首先提供可比较的例子(思考性实验),它也帮助说明投入的确定性功能(例如人与人之间)的确定性功能差异。对于这些情况,我们也许需要使用纯粹的分布(即观察性)技术,而不是干预性技术,而不是干预性结论性结论性的解释性方法,我们如何能更好地解释统计性地解释目前对统计性贡献的精确性贡献的精确性解释。