This paper mathematically derives an analytic solution of the adversarial perturbation on a ReLU network, and theoretically explains the difficulty of adversarial training. Specifically, we formulate the dynamics of the adversarial perturbation generated by the multi-step attack, which shows that the adversarial perturbation tends to strengthen eigenvectors corresponding to a few top-ranked eigenvalues of the Hessian matrix of the loss w.r.t. the input. We also prove that adversarial training tends to strengthen the influence of unconfident input samples with large gradient norms in an exponential manner. Besides, we find that adversarial training strengthens the influence of the Hessian matrix of the loss w.r.t. network parameters, which makes the adversarial training more likely to oscillate along directions of a few samples, and boosts the difficulty of adversarial training. Crucially, our proofs provide a unified explanation for previous findings in understanding adversarial training.
翻译:本文从数学角度从一个 ReLU 网络上的对抗性扰动分析解决方案,并在理论上解释了对抗性培训的困难。 具体地说,我们设计了多步攻击产生的对抗性扰动动态。 这表明对抗性扰动往往会加强与Hessian 阵列中损失输入量的几大顶级隐性值相对应的神经元素。我们也证明对抗性培训倾向于以指数化的方式加强具有大梯度规范的不自信输入样本的影响。此外,我们发现对抗性培训加强了损失 w.r.t. 网络参数的赫西安矩阵的影响,这使得对抗性培训更有可能沿着少数样本的方向进行,并增加了对抗性培训的困难。 十分关键的是,我们的证据为先前在理解对抗性培训方面的调查结果提供了统一的解释。