Deep neural networks (DNNs) are found to be vulnerable to adversarial noise. They are typically misled by adversarial samples to make wrong predictions. To alleviate this negative effect, in this paper, we investigate the dependence between outputs of the target model and input adversarial samples from the perspective of information theory, and propose an adversarial defense method. Specifically, we first measure the dependence by estimating the mutual information (MI) between outputs and the natural patterns of inputs (called natural MI) and MI between outputs and the adversarial patterns of inputs (called adversarial MI), respectively. We find that adversarial samples usually have larger adversarial MI and smaller natural MI compared with those w.r.t. natural samples. Motivated by this observation, we propose to enhance the adversarial robustness by maximizing the natural MI and minimizing the adversarial MI during the training process. In this way, the target model is expected to pay more attention to the natural pattern that contains objective semantics. Empirical evaluations demonstrate that our method could effectively improve the adversarial accuracy against multiple attacks.
翻译:深神经网络(DNN)被认为容易受到对抗性噪音的影响,它们通常被对抗性样品误导,以作出错误的预测。为了减轻这种负面影响,我们在本文件中从信息理论的角度调查目标模型产出与输入对抗性样品之间的依赖性,并提出对抗性防御方法。具体地说,我们首先通过估计产出与输入的自然模式(称为自然MI)和输入的自然模式(称为自然MI)之间的相互信息来衡量这种依赖性。我们发现对抗性样品通常比自然样本(w.r.t.自然样本)有更大的对抗性MI和较小的自然MI。我们受这一观察的驱使,我们提议通过在培训过程中尽量扩大自然MI和尽量减少对抗性MI来增强对抗性强性。这样,目标模型可望更多地注意含有客观语义的自然模式。 经验评估表明,我们的方法可以有效地改进对抗性攻击的对抗性精确性。