A wide variety of fairness metrics and eXplainable Artificial Intelligence (XAI) approaches have been proposed in the literature to identify bias in machine learning models that are used in critical real-life contexts. However, merely reporting on a model's bias, or generating explanations using existing XAI techniques is insufficient to locate and eventually mitigate sources of bias. In this work, we introduce Gopher, a system that produces compact, interpretable, and causal explanations for bias or unexpected model behavior by identifying coherent subsets of the training data that are root-causes for this behavior. Specifically, we introduce the concept of causal responsibility that quantifies the extent to which intervening on training data by removing or updating subsets of it can resolve the bias. Building on this concept, we develop an efficient approach for generating the top-k patterns that explain model bias that utilizes techniques from the ML community to approximate causal responsibility and uses pruning rules to manage the large search space for patterns. Our experimental evaluation demonstrates the effectiveness of Gopher in generating interpretable explanations for identifying and debugging sources of bias.
翻译:文献中提出了一系列广泛的公平度量和可移植人工智能(XAI)方法,以确定在关键现实环境中使用的机器学习模型中的偏差。然而,仅仅报告模型的偏差,或利用现有XAI技术作出解释,不足以找到并最终减轻偏见的来源。在这项工作中,我们引入了Gopher系统,这个系统通过确定一致的培训数据子集,对偏差或意外的模型行为作出压缩、可解释和因果解释。具体地说,我们引入了因果责任概念,它通过删除或更新其子集来量化对培训数据进行干预的程度,可以消除这种偏差。基于这个概念,我们开发了一种高效的方法,以产生顶级模式的偏差,解释利用ML社区的技术来估计因果关系,并使用剪裁规则来管理大范围的模式搜索空间。我们的实验性评估表明Gopher在为识别和调试偏见的来源提供可解释的解释性解释说明方面的有效性。