Improving the robustness of deep neural networks (DNNs) to adversarial examples is an important yet challenging problem for secure deep learning. Across existing defense techniques, adversarial training with Projected Gradient Decent (PGD) is amongst the most effective. Adversarial training solves a min-max optimization problem, with the \textit{inner maximization} generating adversarial examples by maximizing the classification loss, and the \textit{outer minimization} finding model parameters by minimizing the loss on adversarial examples generated from the inner maximization. A criterion that measures how well the inner maximization is solved is therefore crucial for adversarial training. In this paper, we propose such a criterion, namely First-Order Stationary Condition for constrained optimization (FOSC), to quantitatively evaluate the convergence quality of adversarial examples found in the inner maximization. With FOSC, we find that to ensure better robustness, it is essential to use adversarial examples with better convergence quality at the \textit{later stages} of training. Yet at the early stages, high convergence quality adversarial examples are not necessary and may even lead to poor robustness. Based on these observations, we propose a \textit{dynamic} training strategy to gradually increase the convergence quality of the generated adversarial examples, which significantly improves the robustness of adversarial training. Our theoretical and empirical results show the effectiveness of the proposed method.
翻译:提高深层神经网络(DNNS)对对抗性实例的稳健性是保证深层次学习的一个重要但富有挑战性的问题。 在现有的国防技术中,与预计的 " 渐进式 " 体面(PGD)的对抗性培训是最有效的。 反向培训解决了微量最大优化问题,通过最大限度地提高分类损失,产生对抗性实例,并通过将内部最大化产生的对抗性实例的损失降到最低程度,找到模型参数。 衡量内部最大化的好坏程度的标准对于对抗性培训至关重要。 在本文中,我们提出了这样一个标准,即一等标准,即限制优化的一等静态固定状态(FOSC),以便量化地评估内部最大化中发现的对抗性实例的趋同质量。 与FOSC相比,我们发现为了确保更加稳健健,在内部最大化阶段使用更好的趋同质量的对抗性范例至关重要。 然而,在早期,高质量的敌对性优化性范例对于对抗性培训至关重要。 而在早期,我们提出的这种辩论性观测中,甚至会大大改进了我们所提议的经验趋同性培训的结果。