Important insights towards the explainability of neural networks reside in the characteristics of their decision boundaries. In this work, we borrow tools from the field of adversarial robustness, and propose a new perspective that relates dataset features to the distance of samples to the decision boundary. This enables us to carefully tweak the position of the training samples and measure the induced changes on the boundaries of CNNs trained on large-scale vision datasets. We use this framework to reveal some intriguing properties of CNNs. Specifically, we rigorously confirm that neural networks exhibit a high invariance to non-discriminative features, and show that the decision boundaries of a DNN can only exist as long as the classifier is trained with some features that hold them together. Finally, we show that the construction of the decision boundary is extremely sensitive to small perturbations of the training samples, and that changes in certain directions can lead to sudden invariances in the orthogonal ones. This is precisely the mechanism that adversarial training uses to achieve robustness.
翻译:神经网络的可解释性的重要洞察力在于其决定界限的特性。 在这项工作中,我们从对抗性强力领域借用了各种工具,并提出了一个将数据集特征与样本距离与决定边界相联系的新视角。这使我们能够谨慎地调整培训样本的位置,并测量在大规模视觉数据集方面受过训练的CNN的边界上的诱发变化。我们利用这个框架来揭示CNN的一些令人感兴趣的特性。具体地说,我们严格确认神经网络表现出对非差异特性的高度差异性,并表明DNN的决定边界只有在分类者经过训练,具备将样本连接在一起的某些特征的情况下才能存在。最后,我们表明决定边界的构造对培训样本的微小扰动极为敏感,某些方向的改变可能导致孔径的突变。这正是对抗性培训用来实现稳健性的机制。