We identify fragile and robust neurons of deep learning architectures using nodal dropouts of the first convolutional layer. Using an adversarial targeting algorithm, we correlate these neurons with the distribution of adversarial attacks on the network. Adversarial robustness of neural networks has gained significant attention in recent times and highlights intrinsic weaknesses of deep learning networks against carefully constructed distortion applied to input images. In this paper, we evaluate the robustness of state-of-the-art image classification models trained on the MNIST and CIFAR10 datasets against the fast gradient sign method attack, a simple yet effective method of deceiving neural networks. Our method identifies the specific neurons of a network that are most affected by the adversarial attack being applied. We, therefore, propose to make fragile neurons more robust against these attacks by compressing features within robust neurons and amplifying the fragile neurons proportionally.
翻译:我们利用第一个进化层的节点辍学现象,确定了深层学习结构的脆弱和强大的神经元。我们使用对抗性瞄准算法,将这些神经元与网络对抗性攻击的分布联系起来。神经网络的逆向稳健性近来受到极大关注,并突出了深层次学习网络的内在弱点,以防止对输入图像应用经过精心构建的扭曲。在本文件中,我们评估了在MNIST和CIFAR10数据库中培训的最先进的图像分类模型对快速梯度信号方法攻击的稳健性,快速梯度信号方法攻击是一种简单而有效的探测神经网络的有效方法。我们的方法确定了受正在应用的对抗性攻击影响最大的网络的具体神经元。因此,我们建议通过在强健的神经元中压缩特征并相应地扩大脆弱的神经元,使脆弱的神经元更加强大地应对这些攻击。