As machine learning algorithms are increasingly applied to high impact yet high risk tasks, such as medical diagnosis or autonomous driving, it is critical that researchers can explain how such algorithms arrived at their predictions. In recent years, a number of image saliency methods have been developed to summarize where highly complex neural networks "look" in an image for evidence for their predictions. However, these techniques are limited by their heuristic nature and architectural constraints. In this paper, we make two main contributions: First, we propose a general framework for learning different kinds of explanations for any black box algorithm. Second, we specialise the framework to find the part of an image most responsible for a classifier decision. Unlike previous works, our method is model-agnostic and testable because it is grounded in explicit and interpretable image perturbations.
翻译:随着机器学习算法越来越多地应用于影响大但风险大的任务,如医学诊断或自主驾驶等,研究人员必须解释这些算法是如何得出预测结果的。近年来,已经开发出一些图像突出的方法来总结高度复杂的神经网络在哪些方面“看”以图象作为预测的证据。然而,这些技术受到其超自然性和建筑限制的限制。在本文中,我们做出了两个主要贡献:首先,我们提出了一个为任何黑盒算法学习不同类型解释的一般框架。第二,我们专门设计一个框架,以找到对分类决定最负责的图像部分。与以往的作品不同,我们的方法是模型不可接受和可测试的,因为它基于清晰和可解释的图像扰动。