Visual counterfactual explanations (VCEs) in image space are an important tool to understand decisions of image classifiers as they show under which changes of the image the decision of the classifier would change. Their generation in image space is challenging and requires robust models due to the problem of adversarial examples. Existing techniques to generate VCEs in image space suffer from spurious changes in the background. Our novel perturbation model for VCEs together with its efficient optimization via our novel Auto-Frank-Wolfe scheme yields sparse VCEs which lead to subtle changes specific for the target class. Moreover, we show that VCEs can be used to detect undesired behavior of ImageNet classifiers due to spurious features in the ImageNet dataset.
翻译:图像空间的视觉反事实解释(VCEs)是理解图像分类者决定的一个重要工具,因为它们显示在图像分类者决定的图像变化中,图像分类者决定的变化会发生改变。他们在图像空间的生成具有挑战性,并且由于对抗性实例问题,需要强大的模型。在图像空间生成 VCE的现有技术会因背景的虚假变化而受到影响。我们的新颖的 VCE 扰动模型,连同通过我们的新颖的Auto-Frank-Wolfe 计划高效优化,产生稀疏的VCEs,导致目标类别特有的细微变化。此外,我们显示,由于图像网络数据集的虚假特征, VCEs可以用来检测图像网络分类者不理想的行为。