Model interpretability is a requirement in many applications in which crucial decisions are made by users relying on a model's outputs. The recent movement for "algorithmic fairness" also stipulates explainability, and therefore interpretability of learning models. And yet the most successful contemporary Machine Learning approaches, the Deep Neural Networks, produce models that are highly non-interpretable. We attempt to address this challenge by proposing a technique called CNN-INTE to interpret deep Convolutional Neural Networks (CNN) via meta-learning. In this work, we interpret a specific hidden layer of the deep CNN model on the MNIST image dataset. We use a clustering algorithm in a two-level structure to find the meta-level training data and Random Forest as base learning algorithms to generate the meta-level test data. The interpretation results are displayed visually via diagrams, which clearly indicates how a specific test instance is classified. Our method achieves global interpretation for all the test instances without sacrificing the accuracy obtained by the original deep CNN model. This means our model is faithful to the deep CNN model, which leads to reliable interpretations.
翻译:模型解释性是许多应用中的一项要求,在这些应用中,依赖模型输出结果的用户做出关键决定。最近“分析公平性”运动也规定了学习模型的解释性,因此也规定了学习模型的可解释性。然而,最成功的当代机器学习方法“深神经网络”产生了高度不可解释的模型。我们试图通过提出名为CNN-INTE的技术来应对这一挑战,该技术通过元学习来解释深革命神经网络(CNN-INTE) 。在这项工作中,我们解读了MNIST图像数据集中深层CNN模型的隐蔽层。我们用两层结构中的组合算法来寻找元级培训数据和随机森林作为生成元级测试数据的基础学习算法。这些解释结果通过图表以视觉显示,这清楚地表明了具体测试实例是如何分类的。我们的方法在不牺牲原深CNN模型所获取的准确性的情况下实现所有测试实例的全球解释。这意味着我们的模型忠实于深度CNN模型,从而导致可靠的解释。