In this paper, we investigate the impact of neural networks (NNs) topology on adversarial robustness. Specifically, we study the graph produced when an input traverses all the layers of a NN, and show that such graphs are different for clean and adversarial inputs. We find that graphs from clean inputs are more centralized around highway edges, whereas those from adversaries are more diffuse, leveraging under-optimized edges. Through experiments on a variety of datasets and architectures, we show that these under-optimized edges are a source of adversarial vulnerability and that they can be used to detect adversarial inputs.
翻译:在本文中,我们调查神经网络(NNs)的地形学对对抗性强力的影响。具体地说,我们研究输入时产生的图,以显示这种图在清洁和对抗性输入方面不同。我们发现,来自清洁输入的图更多地集中在公路边缘周围,而来自对手的图则比较分散,利用了不尽人意的边缘。通过对各种数据集和结构的实验,我们发现这些未充分优化的边缘是对抗性脆弱性的来源,可以用来探测对抗性输入。