使用概率性对立反事实解释黑色Box 算法 (Explaining Black-Box Algorithms Using Probabilistic Contrastive Counterfactuals)

There has been a recent resurgence of interest in explainable artificial intelligence (XAI) that aims to reduce the opaqueness of AI-based decision-making systems, allowing humans to scrutinize and trust them. Prior work in this context has focused on the attribution of responsibility for an algorithm's decisions to its inputs wherein responsibility is typically approached as a purely associational concept. In this paper, we propose a principled causality-based approach for explaining black-box decision-making systems that addresses limitations of existing methods in XAI. At the core of our framework lies probabilistic contrastive counterfactuals, a concept that can be traced back to philosophical, cognitive, and social foundations of theories on how humans generate and select explanations. We show how such counterfactuals can quantify the direct and indirect influences of a variable on decisions made by an algorithm, and provide actionable recourse for individuals negatively affected by the algorithm's decision. Unlike prior work, our system, LEWIS: (1)can compute provably effective explanations and recourse at local, global and contextual levels (2)is designed to work with users with varying levels of background knowledge of the underlying causal model and (3)makes no assumptions about the internals of an algorithmic system except for the availability of its input-output data. We empirically evaluate LEWIS on three real-world datasets and show that it generates human-understandable explanations that improve upon state-of-the-art approaches in XAI, including the popular LIME and SHAP. Experiments on synthetic data further demonstrate the correctness of LEWIS's explanations and the scalability of its recourse algorithm.

翻译：最近人们重新对可解释的人工智能(XAI)感兴趣,这种人工智能旨在降低基于AI的决策系统的不透明性,使人类能够仔细检查和信任这些系统。以前的工作重点是将算法决定的责任归属于其投入,通常将责任作为纯粹联系的概念来处理。在本文件中,我们提出了一个基于原则的因果关系的方法来解释黑箱决策系统,解决XAI现有方法的局限性。我们框架的核心是具有可比较性的对比反事实,这一概念可以追溯到关于人类如何产生和选择解释的理论的哲学、认知和社会基础。我们表明,这种反事实可以量化对算法决定的直接和间接影响,为受到算法决定不利影响的个人提供可操作的追索。与以前的工作不同,我们的系统LEWIS:(1)可以在地方、全球和背景各级进行可比较的有效解释和追索。这一概念可以追溯到可以追溯到关于人类产生和选择解释的理论基础的理论基础、认知和社会基础,可以追溯到关于人类产生和选择的理论的理论基础的理论基础和社会基础。我们展示的是,除了LIIS能够显示其内部数据流数据分析的准确性,我们没有关于它关于其内部数据分析结果的逻辑系统关于它的数据的逻辑分析结果的推算外的推算。