We consider generating explanations for neural networks in cases where the network's training data is not accessible, for instance due to privacy or safety issues. Recently, $\mathcal{I}$-Nets have been proposed as a sample-free approach to post-hoc, global model interpretability that does not require access to training data. They formulate interpretation as a machine learning task that maps network representations (parameters) to a representation of an interpretable function. In this paper, we extend the $\mathcal{I}$-Net framework to the cases of standard and soft decision trees as surrogate models. We propose a suitable decision tree representation and design of the corresponding $\mathcal{I}$-Net output layers. Furthermore, we make $\mathcal{I}$-Nets applicable to real-world tasks by considering more realistic distributions when generating the $\mathcal{I}$-Net's training data. We empirically evaluate our approach against traditional global, post-hoc interpretability approaches and show that it achieves superior results when the training data is not accessible.
翻译:我们考虑在网络培训数据无法获取的情况下,例如由于隐私或安全问题,为神经网络提供解释。最近,有人提议将$\mathcal{I}-Net作为不含样本的方法,处理不需要获得培训数据的全球模型解释能力,而不需要获得培训数据。他们将解释作为一种机器学习任务,将网络的表示(参数)映射成可解释功能的表示。在本文中,我们将$\mathcal{I}-Net框架扩大到标准型和软型决策树作为替代模型的情况。我们建议了相应的美元/mathcal{I}-Net产出层的适当决策树代表及设计。此外,我们通过在生成 $mathcal{I}-Net 培训数据时考虑更现实的分布,将$\mathcal{I}-Net用于现实世界任务。我们用经验评估了我们的方法,以传统的全球、后可解释性的方法为替代方法,并表明当培训数据无法获取时,它能够取得更高的效果。