The widespread deployment of deep nets in practical applications has lead to a growing desire to understand how and why such black-box methods perform prediction. Much work has focused on understanding what part of the input pattern (an image, say) is responsible for a particular class being predicted, and how the input may be manipulated to predict a different class. We focus instead on understanding which of the internal features computed by the neural net are responsible for a particular class. We achieve this by mimicking part of the neural net with an oblique decision tree having sparse weight vectors at the decision nodes. Using the recently proposed Tree Alternating Optimization (TAO) algorithm, we are able to learn trees that are both highly accurate and interpretable. Such trees can faithfully mimic the part of the neural net they replaced, and hence they can provide insights into the deep net black box. Further, we show we can easily manipulate the neural net features in order to make the net predict, or not predict, a given class, thus showing that it is possible to carry out adversarial attacks at the level of the features. These insights and manipulations apply globally to the entire training and test set, not just at a local (single-instance) level. We demonstrate this robustly in the MNIST and ImageNet datasets with LeNet5 and VGG networks.
翻译:在实际应用中广泛部署深网已导致人们越来越渴望了解这种黑盒方法是如何和为什么进行预测的。许多工作都集中在了解输入模式(图像,说)的哪一部分对某一类负责,以及如何对输入进行操纵以预测另一类。我们注重的是了解神经网所计算的内部特征对某一类负责。我们通过模仿神经网的一部分神经网,在决策节点上用一个微弱的重量矢量的斜斜斜决定树来做到这一点。利用最近提议的树对称优化算法(TAO),我们学会了高度准确和可解释的树木。这些树可以忠实地模拟它们所更换的神经网的部分,从而能够提供对深层净黑盒的洞察力。此外,我们证明我们可以很容易地操纵神经网的特性,以便让网络预测或不预测某一类,从而表明有可能在特征水平上进行对抗性攻击。这些洞察力和操纵都能够在全球范围上应用这种精准的图像网络,我们不会在本地一级展示和测试这一数据。