机器学习反事实解释:审查 (Counterfactual Explanations for Machine Learning: A Review)

Machine learning plays a role in many deployed decision systems, often in ways that are difficult or impossible to understand by human stakeholders. Explaining, in a human-understandable way, the relationship between the input and output of machine learning models is essential to the development of trustworthy machine-learning-based systems. A burgeoning body of research seeks to define the goals and methods of explainability in machine learning. In this paper, we seek to review and categorize research on counterfactual explanations, a specific class of explanation that provides a link between what could have happened had input to a model been changed in a particular way. Modern approaches to counterfactual explainability in machine learning draw connections to the established legal doctrine in many countries, making them appealing to fielded systems in high-impact areas such as finance and healthcare. Thus, we design a rubric with desirable properties of counterfactual explanation algorithms and comprehensively evaluate all currently-proposed algorithms against that rubric. Our rubric provides easy comparison and comprehension of the advantages and disadvantages of different approaches and serves as an introduction to major research themes in this field. We also identify gaps and discuss promising research directions in the space of counterfactual explainability.

翻译：在许多已部署的决策系统中,机器学习往往以难以理解或无法为人类利益攸关方理解的方式发挥作用。用人理解的方式解释机器学习模式的投入和产出之间的关系,对于建立可靠的机器学习系统至关重要。一大批新的研究试图确定机器学习的目标和解释方法。在本文件中,我们试图审查和分类反事实解释研究,这是在可能发生的情况之间建立联系的一种具体解释类别,对模型的投入特别有所改变。机器学习中反事实解释的现代方法与许多国家的既定法律理论相连接,使它们对影响较大的领域,如金融和保健等领域的外地系统有吸引力。因此,我们设计了一个带有反事实解释算法的适宜特性的图案,并全面评价目前针对这一图案的所有拟议算法。我们的图解为比较和理解不同方法的利弊提供了方便,并成为该领域主要研究主题的导言。我们还找出差距,并讨论反事实解释空间中有希望的研究方向。

相关内容

Machine Learning

关注 2241

机器学习（Machine Learning）是一个研究计算学习方法的国际论坛。该杂志发表文章，报告广泛的学习方法应用于各种学习问题的实质性结果。该杂志的特色论文描述研究的问题和方法，应用研究和研究方法的问题。有关学习问题或方法的论文通过实证研究、理论分析或与心理现象的比较提供了坚实的支持。应用论文展示了如何应用学习方法来解决重要的应用问题。研究方法论文改进了机器学习的研究方法。所有的论文都以其他研究人员可以验证或复制的方式描述了支持证据。论文还详细说明了学习的组成部分，并讨论了关于知识表示和性能任务的假设。官网地址：http://dblp.uni-trier.de/db/journals/ml/