Image classification models can depend on multiple different semantic attributes of the image. An explanation of the decision of the classifier needs to both discover and visualize these properties. Here we present StylEx, a method for doing this, by training a generative model to specifically explain multiple attributes that underlie classifier decisions. A natural source for such attributes is the StyleSpace of StyleGAN, which is known to generate semantically meaningful dimensions in the image. However, because standard GAN training is not dependent on the classifier, it may not represent these attributes which are important for the classifier decision, and the dimensions of StyleSpace may represent irrelevant attributes. To overcome this, we propose a training procedure for a StyleGAN, which incorporates the classifier model, in order to learn a classifier-specific StyleSpace. Explanatory attributes are then selected from this space. These can be used to visualize the effect of changing multiple attributes per image, thus providing image-specific explanations. We apply StylEx to multiple domains, including animals, leaves, faces and retinal images. For these, we show how an image can be modified in different ways to change its classifier output. Our results show that the method finds attributes that align well with semantic ones, generate meaningful image-specific explanations, and are human-interpretable as measured in user-studies.
翻译:图像分类模型可以取决于图像的多个不同的语义属性。 对分类器决定的解释需要同时发现和直观化这些属性。 我们在这里展示StylEx, 这是一种这样做的方法, 培训一种基因模型, 具体解释分类器决定背后的多个属性。 这种属性的自然来源是Style Space of StyleGAN, 众所周知, 它会在图像中产生具有语义意义的维度。 但是, 由于标准的 GAN 培训并不取决于分类器, 它可能不代表这些对于分类器决定很重要的属性, 以及 StylSpace 的维度可能代表无关的属性。 为了克服这一点, 我们为StylGAN 提出了一个培训程序, 其中包括分类器模型, 以便学习一个特定的分类器样式空间。 然后从这个空格中选择了解释性属性。 这些属性可用于想象改变每个图像的多个属性的效果, 从而提供针对图像的解释。 我们将StylEx 应用到多个域, 包括动物、 叶、 脸和 retinal 图像。 对于这些域, 我们发现一个图像如何以不同的方式修改图像, 来改变它, 以不同的方式, 改变分类模型的特性, 以改变其特定的属性, 将产生有意义的图像。