Visual Counterfactual Explanations (VCEs) are an important tool to understand the decisions of an image classifier. They are 'small' but 'realistic' semantic changes of the image changing the classifier decision. Current approaches for the generation of VCEs are restricted to adversarially robust models and often contain non-realistic artefacts, or are limited to image classification problems with few classes. In this paper, we overcome this by generating Diffusion Visual Counterfactual Explanations (DVCEs) for arbitrary ImageNet classifiers via a diffusion process. Two modifications to the diffusion process are key for our DVCEs: first, an adaptive parameterization, whose hyperparameters generalize across images and models, together with distance regularization and late start of the diffusion process, allow us to generate images with minimal semantic changes to the original ones but different classification. Second, our cone regularization via an adversarially robust model ensures that the diffusion process does not converge to trivial non-semantic changes, but instead produces realistic images of the target class which achieve high confidence by the classifier.
翻译:视觉反事实解释( VCE) 是理解图像分类者决定的重要工具 。 它们是图像分类者改变分类者决定的图像的“ 小” 但“ 现实” 语义变化 。 目前生成 VCE 的方法仅限于对抗性强的模型, 通常包含非现实的工艺品, 或仅限于少数类的图像分类问题 。 在本文中, 我们通过一个扩散过程为任意的图像分类者生成 Difulte 视觉反事实解释( DVCE) 来克服这一点 。 两种对图像分类过程的修改是我们 DVCE 的关键 。 首先, 一种适应性参数化, 其超光量参数在图像和模型之间概括化, 加上距离正规化和扩散过程的较晚开始, 使我们能够生成图像, 将最小的语义变化转化为原始的但不同的分类 。 其次, 我们通过一个对抗性强的模型来调节连接, 确保传播过程不会与微不足道的非二次非典型变化相融合, 而是产生目标类的符合现实的图像, 使分类者高度信任 。