Counterfactual examples identify how inputs can be altered to change the predicted class of a classifier, thus opening up the black-box nature of, e.g., deep neural networks. We propose a method, ECINN, that utilizes the generative capacities of invertible neural networks for image classification to generate counterfactual examples efficiently. In contrast to competing methods that sometimes need a thousand evaluations or more of the classifier, ECINN has a closed-form expression and generates a counterfactual in the time of only two evaluations. Arguably, the main challenge of generating counterfactual examples is to alter only input features that affect the predicted outcome, i.e., class-dependent features. Our experiments demonstrate how ECINN alters class-dependent image regions to change the perceptual and predicted class of the counterfactuals. Additionally, we extend ECINN to also produce heatmaps (ECINNh) for easy inspection of, e.g., pairwise class-dependent changes in the generated counterfactual examples. Experimentally, we find that ECINNh outperforms established methods that generate heatmap-based explanations.
翻译:反事实实例指出如何改变投入以改变分类器的预测类别,从而打开(例如)深神经网络的黑盒性质。我们提出一种方法,即ECINN。我们提出一种方法,即ECINN,利用垂直神经网络的基因变异能力进行图像分类,以产生反事实实例。与有时需要1000次评价或以上分类器的相竞方法相比,ECINN有一个封闭式的表达方式,在仅仅进行两次评价时会产生反事实。可以说,产生反事实例子的主要挑战是仅仅改变影响预测结果的输入特征,即依赖阶级特征。我们的实验表明ECINN如何改变依赖阶级的图像区域,以改变反事实的感知和预测类别。此外,我们扩大ECINN,以制作热图(ECINNh),以便于检查,例如,生成的反事实实例中基于阶级的对等变化。我们实验发现,ECINNh超越了产生热映解释的方法。