Deep Convolution Neural Networks (CNNs) can easily be fooled by subtle, imperceptible changes to the input images. To address this vulnerability, adversarial training creates perturbation patterns and includes them in the training set to robustify the model. In contrast to existing adversarial training methods that only use class-boundary information (e.g., using a cross-entropy loss), we propose to exploit additional information from the feature space to craft stronger adversaries that are in turn used to learn a robust model. Specifically, we use the style and content information of the target sample from another class, alongside its class-boundary information to create adversarial perturbations. We apply our proposed multi-task objective in a deeply supervised manner, extracting multi-scale feature knowledge to create maximally separating adversaries. Subsequently, we propose a max-margin adversarial training approach that minimizes the distance between source image and its adversary and maximizes the distance between the adversary and the target image. Our adversarial training approach demonstrates strong robustness compared to state-of-the-art defenses, generalizes well to naturally occurring corruptions and data distributional shifts, and retains the model accuracy on clean examples.
翻译:深相神经网络(CNNs)很容易被隐蔽的、无法察觉的对输入图像的变化所蒙骗。 为解决这种脆弱性,对抗性培训创造了扰动模式,并将之纳入强化模型的训练中。 与现有的对抗性培训方法相比,我们提议利用地貌空间的额外信息来培养更强大的对手,而这些对手又被用来学习一个强大的模型。 具体地说,我们利用另一类目标样本的风格和内容信息,连同其等级边界信息,来创造对抗性扰动。我们以严密监督的方式应用我们拟议的多任务目标,提取多规模的特性知识,以创造最大程度的对立方。 随后,我们提出一个最大限度的对抗性培训方法,最大限度地减少源图像与对手之间的距离,最大限度地扩大对手与目标图像之间的距离。 我们的对抗性培训方法显示与状态防御相比强健健健健,将模型概括到自然发生的腐败和数据分布的精确性变化,并保存清洁的模型。