Existing explanation tools for image classifiers usually give only a single explanation for an image's classification. For many images, however, both humans and image classifiers accept more than one explanation for the image label. Thus, restricting the number of explanations to just one is arbitrary and severely limits the insight into the behavior of the classifier. In this paper, we describe an algorithm and a tool, MultiReX, for computing multiple explanations of the output of a black-box image classifier for a given image. Our algorithm uses a principled approach based on causal theory. We analyse its theoretical complexity and provide experimental results showing that MultiReX finds multiple explanations on 96% of the images in the ImageNet-mini benchmark, whereas previous work finds multiple explanations only on 11%.
翻译:暂无翻译