Machine learning plays a role in many deployed decision systems, often in ways that are difficult or impossible to understand by human stakeholders. Explaining, in a human-understandable way, the relationship between the input and output of machine learning models is essential to the development of trustworthy machine learning based systems. A burgeoning body of research seeks to define the goals and methods of \emph{explainability} in machine learning. In this paper, we seek to review and categorize research on \emph{counterfactual explanations}, a specific class of explanation that provides a link between what could have happened had input to a model been changed in a particular way. Modern approaches to counterfactual explainability in machine learning draw connections to the established legal doctrine in many countries, making them appealing to fielded systems in high-impact areas such as finance and healthcare. Thus, we design a rubric with desirable properties of counterfactual explanation algorithms and comprehensively evaluate all currently proposed algorithms against that rubric. Our rubric provides easy comparison and comprehension of the advantages and disadvantages of different approaches and serves as an introduction to major research themes in this field. We also identify gaps and discuss promising research directions in the space of counterfactual explainability.
翻译:在许多已部署的决策系统中,机器学习往往以难以或无法为人类利益攸关方理解的方式发挥作用。用人理解的方式解释机器学习模式的投入和产出之间的关系,对于发展可靠的机器学习系统至关重要。一个新兴的研究机构寻求界定机器学习中“emph{explainableity }”的目标和方法。在本文件中,我们力求审查和分类关于\emph{counterfact解释}的研究,这是一个具体的解释类别,它提供了可能发生的情况之间的联系,而对于模型的投入已经以特别的方式改变。机器学习中反事实解释的现代方法与许多国家的既定法律理论相连接,使它们呼吁在诸如财政和保健等影响大的领域建立外地系统。因此,我们设计了一个带有反事实解释算法的可取特性的标语,并全面评价目前针对这一标语的所有拟议算法。我们的标注提供了对不同方法的优缺点的简单比较和理解,并作为该领域主要研究主题的导言。我们还找出了空间方面的实际差距,并讨论了可行的研究方向。