A wide variety of model explanation approaches have been proposed in recent years, all guided by very different rationales and heuristics. In this paper, we take a new route and cast interpretability as a statistical inference problem. We propose a general deep probabilistic model designed to produce interpretable predictions. The model parameters can be learned via maximum likelihood, and the method can be adapted to any predictor network architecture and any type of prediction problem. Our method is a case of amortized interpretability models, where a neural network is used as a selector to allow for fast interpretation at inference time. Several popular interpretability methods are shown to be particular cases of regularised maximum likelihood for our general model. We propose new datasets with ground truth selection which allow for the evaluation of the features importance map. Using these datasets, we show experimentally that using multiple imputation provides more reasonable interpretations.
翻译:近年来提出了多种多样的示范解释方法,这些方法都以非常不同的理由和超自然论为指导。在本文中,我们采用了一条新路线,将可解释性作为一个统计推论问题。我们提出了一种一般的深度概率模型,旨在提出可解释的预测。模型参数可以通过最大可能性来学习,该方法可以适应任何预测者网络架构和任何类型的预测问题。我们的方法是一个摊销可解释性模型的例子,在这种模型中,神经网络被用作选择器,以便在推论时间进行快速解释。一些流行的可解释性方法被证明是常规化我们一般模型最大可能性的特例。我们提出了新的具有地面真相选择的数据集,以便评估特征的重要性地图。我们利用这些数据集实验性地表明,使用多重估算方法可以提供更合理的解释。