Although deep models achieve high predictive performance, it is difficult for humans to understand the predictions they made. Explainability is important for real-world applications to justify their reliability. Many example-based explanation methods have been proposed, such as representer point selection, where an explanation model defined by a set of training examples is used for explaining a prediction model. For improving the interpretability, reducing the number of examples in the explanation model is important. However, the explanations with fewer examples can be unfaithful since it is difficult to approximate prediction models well by such example-based explanation models. The unfaithful explanations mean that the predictions by the explainable model are different from those by the prediction model. We propose a method for training deep models such that their predictions are faithfully explained by explanation models with a small number of examples. We train the prediction and explanation models simultaneously with a sparse regularizer for reducing the number of examples. The proposed method can be incorporated into any neural network-based prediction models. Experiments using several datasets demonstrate that the proposed method improves faithfulness while keeping the predictive performance.
翻译:虽然深层模型具有高预测性,但人类很难理解它们所作的预测,解释性对于真实世界应用来说很重要。许多以实例为基础的解释方法已经提出,例如代表点选择,用一组培训范例界定的解释模型来解释预测模型。为了改进解释性,减少解释模型中的例子数量很重要。但是,用较少的例子来解释可能是不真实的,因为很难用这种以实例为基础的解释模型来很好地估计预测模型。不真实的解释意味着,用可解释模型作出的预测不同于预测模型的预测。我们建议了一种方法来培训深层次模型,以便用少量的例子解释模型来忠实地解释其预测。我们同时培训预测和解释模型,同时使用稀少的正规化数据来减少实例的数量。提议的方法可以纳入任何以神经网络为基础的预测模型。使用若干数据集进行的实验表明,拟议的方法在保持预测性业绩的同时,提高了忠实性。