Deep Neural Networks (DNNs) have recently achieved great success in many classification tasks. Unfortunately, they are vulnerable to adversarial attacks that generate adversarial examples with a small perturbation to fool DNN models, especially in model sharing scenarios. Adversarial training is proved to be the most effective strategy that injects adversarial examples into model training to improve the robustness of DNN models to adversarial attacks. However, adversarial training based on the existing adversarial examples fails to generalize well to standard, unperturbed test data. To achieve a better trade-off between standard accuracy and adversarial robustness, we propose a novel adversarial training framework called LAtent bounDary-guided aDvErsarial tRaining (LADDER) that adversarially trains DNN models on latent boundary-guided adversarial examples. As opposed to most of the existing methods that generate adversarial examples in the input space, LADDER generates a myriad of high-quality adversarial examples through adding perturbations to latent features. The perturbations are made along the normal of the decision boundary constructed by an SVM with an attention mechanism. We analyze the merits of our generated boundary-guided adversarial examples from a boundary field perspective and visualization view. Extensive experiments and detailed analysis on MNIST, SVHN, CelebA, and CIFAR-10 validate the effectiveness of LADDER in achieving a better trade-off between standard accuracy and adversarial robustness as compared with vanilla DNNs and competitive baselines.
翻译:深神经网络(DNN)最近在许多分类任务中取得了巨大成功,但不幸的是,它们很容易受到对抗性攻击,而对抗性攻击则产生一些小的干扰,以欺骗DNN模式的模式,特别是在模式共享情景中。反向培训被证明是最有效的战略,将对抗性例子引入示范培训,以提高DNN模式对对抗性攻击的稳健性。然而,基于现有对抗性例子的对抗性训练未能将标准、不受干扰的测试数据综合起来。为了在标准准确性与对抗性强性之间实现更好的权衡,我们建议建立一个新的对抗性训练框架,称为Latent bounDary-指导 DvERsarial tRaining模式(LADDDDDER),将DNNN模式引入示范性培训,以提高DNNNN模式对对抗性攻击性攻击性攻击的稳健性。 与大多数现有的在投入空间产生对抗性例子的方法相反,LADDDDDD通过增加对潜在特征的透视度来更好地权衡,在S-DRDR的正常的激烈的边界观点上,从S-DRRDR的边界分析中,从S-RO-R的实地和S-DRDR的实地分析中进行更好的分析。