Although current deep learning techniques have yielded superior performance on various computer vision tasks, yet they are still vulnerable to adversarial examples. Adversarial training and its variants have been shown to be the most effective approaches to defend against adversarial examples. These methods usually regularize the difference between output probabilities for an adversarial and its corresponding natural example. However, it may have a negative impact if the model misclassifies a natural example. To circumvent this issue, we propose a novel adversarial training scheme that encourages the model to produce similar outputs for an adversarial example and its ``inverse adversarial'' counterpart. These samples are generated to maximize the likelihood in the neighborhood of natural examples. Extensive experiments on various vision datasets and architectures demonstrate that our training method achieves state-of-the-art robustness as well as natural accuracy. Furthermore, using a universal version of inverse adversarial examples, we improve the performance of single-step adversarial training techniques at a low computational cost.
翻译:虽然目前深层次的学习技术在各种计算机愿景任务上取得了优异的成绩,但是它们仍然容易受到对抗性实例的影响。反向培训及其变式已证明是对抗性实例的最有效防守方法。这些方法通常规范对抗性实例的产出概率与对应自然实例之间的差别。但是,如果模型错误地分解自然实例,可能会产生消极影响。为回避这一问题,我们提议了一个新型的对抗性培训计划,鼓励模型为对抗性实例及其“反对性对抗性对等”生成类似产出。这些样本的生成是为了在自然实例周围最大限度地增加可能性。关于各种视觉数据集和结构的广泛实验表明,我们的培训方法既能达到最新强健性,又能自然准确性。此外,我们使用一个通用的反对性对抗性实例版本,以低计算成本提高单步对抗性对抗性培训技术的性能。