Many problems in computer vision have recently been tackled using models whose predictions cannot be easily interpreted, most commonly deep neural networks. Surrogate explainers are a popular post-hoc interpretability method to further understand how a model arrives at a particular prediction. By training a simple, more interpretable model to locally approximate the decision boundary of a non-interpretable system, we can estimate the relative importance of the input features on the prediction. Focusing on images, surrogate explainers, e.g., LIME, generate a local neighbourhood around a query image by sampling in an interpretable domain. However, these interpretable domains have traditionally been derived exclusively from the intrinsic features of the query image, not taking into consideration the manifold of the data the non-interpretable model has been exposed to in training (or more generally, the manifold of real images). This leads to suboptimal surrogates trained on potentially low probability images. We address this limitation by aligning the local neighbourhood on which the surrogate is trained with the original training data distribution, even when this distribution is not accessible. We propose two approaches to do so, namely (1) altering the method for sampling the local neighbourhood and (2) using perceptual metrics to convey some of the properties of the distribution of natural images.
翻译:计算机视觉方面的许多问题最近都是使用模型解决的,模型的预测不易解释,最常见的是深度神经网络。代数解释器是一种流行的后热解释方法,可以进一步理解模型是如何到达特定预测的。通过培训一个简单、更易解释的模型,可以在当地接近非解释系统的决定界限,我们可以估计预测中输入特征的相对重要性。关注图像,代数解释器,例如LIME,通过在可解释的域内取样生成一个查询图像周围的本地邻居。然而,这些可解释域传统上完全取自查询图像的内在特征,而没有考虑到非解释模型在培训(或更一般地说,真实图像的多重)中暴露的数据的方块。这导致在潜在概率低的图像方面受过训练的次优化的代数。我们通过使被训练的代数所在的本地邻居与最初的培训数据分布相匹配,即使无法获得这种传播。我们建议采取两种办法这样做,即(1) 改变对本地图像的采样方法,使用每个测量的自然图像的特性。