Upon the discovery of adversarial attacks, robust models have become obligatory for deep learning-based systems. Adversarial training with first-order attacks has been one of the most effective defenses against adversarial perturbations to this day. The majority of the adversarial training approaches focus on iteratively perturbing each pixel with the gradient of the loss function with respect to the input image. However, the adversarial training with gradient-based attacks lacks diversity and does not generalize well to natural images and various attacks. This study presents a robust training algorithm where the adversarial perturbations are automatically synthesized from a random vector using a generator network. The classifier is trained with cross-entropy loss regularized with the optimal transport distance between the representations of the natural and synthesized adversarial samples. Unlike prevailing generative defenses, the proposed one-step attack generation framework synthesizes diverse perturbations without utilizing gradient of the classifier's loss. Experimental results show that the proposed approach attains comparable robustness with various gradient-based and generative robust training techniques on CIFAR10, CIFAR100, and SVHN datasets. In addition, compared to the baselines, the proposed robust training framework generalizes well to the natural samples. Code and trained models will be made publicly available.
翻译:在发现对抗性攻击后,强势模式已成为深层次学习系统的义务。第一阶攻击的对抗性培训是迄今防止对抗性扰动的最有效防御手段之一。大多数对抗性培训方法侧重于在输入图像方面使每个像素与损失函数梯度发生迭接扰;然而,使用梯度攻击的对抗性培训缺乏多样性,不能与自然图像和各种攻击相容。这项研究展示了一种强有力的培训算法,在这种算法中,使用发电机网络从随机矢量中自动合成对抗性扰动性扰动。对分类者进行了交叉热带损失的正规化培训,在展示自然和合成的对抗性样品之间的最佳运输距离上实现了正常化。与流行的基因防御不同,拟议的单步攻击生成框架在不使用分类损失梯度的情况下对各种扰动进行综合。实验结果显示,拟议的方法在各种梯度和基因强化性强的培训技术下获得了类似的稳健性。在CFAR10、CIFAR100和SVHN模型上,对分类性损失进行了交叉性损失。与经过良好培训的一般模型相比,将具备坚实性基准。