Machine learning plays a role in many deployed decision systems, often in ways that are difficult or impossible to understand by human stakeholders. Explaining, in a human-understandable way, the relationship between the input and output of machine learning models is essential to the development of trustworthy machine learning based systems. A burgeoning body of research seeks to define the goals and methods of explainability in machine learning. In this paper, we seek to review and categorize research on counterfactual explanations, a specific class of explanation that provides a link between what could have happened had input to a model been changed in a particular way. Modern approaches to counterfactual explainability in machine learning draw connections to the established legal doctrine in many countries, making them appealing to fielded systems in high-impact areas such as finance and healthcare. Thus, we design a rubric with desirable properties of counterfactual explanation algorithms and comprehensively evaluate all currently proposed algorithms against that rubric. Our rubric provides easy comparison and comprehension of the advantages and disadvantages of different approaches and serves as an introduction to major research themes in this field. We also identify gaps and discuss promising research directions in the space of counterfactual explainability.
翻译:在许多已部署的决策系统中,机器学习往往以难以理解或无法为人类利益攸关方理解的方式发挥作用。用人理解的方式解释机器学习模式的投入和产出之间的关系,对于发展可靠的机器学习系统至关重要。一个新兴的研究机构力求确定机器学习的目标和解释方法。在本文件中,我们力求审查和分类反事实解释研究,这是在可能发生的情况之间建立联系的一种具体解释类别,如果对模型的投入以某种特定方式发生变化。机器学习中反事实解释的现代方法与许多国家的既定法律理论相联系,使它们吸引在诸如财政和保健等影响较大的领域的实地系统。因此,我们设计了一个带有反事实解释算法的适当特性的标语,并全面评价目前针对该标语提出的所有算法。我们的标语便于比较和理解不同方法的利弊,并作为该领域主要研究主题的导言。我们还查明差距,并讨论反事实解释空间有希望的研究方向。