An acknowledged weakness of neural networks is their vulnerability to adversarial perturbations to the inputs. To improve the robustness of these models, one of the most popular defense mechanisms is to alternatively maximize the loss over the constrained perturbations (or called adversaries) on the inputs using projected gradient ascent and minimize over weights. In this paper, we analyze the dynamics of the maximization step towards understanding the experimentally observed effectiveness of this defense mechanism. Specifically, we investigate the non-concave landscape of the adversaries for a two-layer neural network with a quadratic loss. Our main result proves that projected gradient ascent finds a local maximum of this non-concave problem in a polynomial number of iterations with high probability. To our knowledge, this is the first work that provides a convergence analysis of the first-order adversaries. Moreover, our analysis demonstrates that, in the initial phase of adversarial training, the scale of the inputs matters in the sense that a smaller input scale leads to faster convergence of adversarial training and a "more regular" landscape. Finally, we show that these theoretical findings are in excellent agreement with a series of experiments.
翻译:神经网络的一个公认的弱点是它们容易受到投入的对抗性干扰。为了提高这些模型的稳健性,最受欢迎的防御机制之一是利用预测梯度升降率和减低重量来尽量扩大对投入的受限扰动(或称为对手)的损失。在本文件中,我们分析了了解这一防御机制实验性有效性的最大化步骤的动态。具体地说,我们调查了对手对双层神经网络的非凝固面貌以及四面形损失。我们的主要结果证明,预测的梯度在多层迭代中发现这种非凝固问题的当地最大可能性。据我们所知,这是首次对一级对手进行趋同分析的工作。此外,我们的分析表明,在对抗性训练的初始阶段,投入规模问题的规模意味着较小的投入规模导致对抗性训练的更快融合和“更常规”的景观。我们最后表明,这些理论结论与一系列实验非常一致。