The vulnerability of Deep Neural Networks to Adversarial Attacks has fuelled research towards building robust models. While most Adversarial Training algorithms aim at defending attacks constrained within low magnitude Lp norm bounds, real-world adversaries are not limited by such constraints. In this work, we aim to achieve adversarial robustness within larger bounds, against perturbations that may be perceptible, but do not change human (or Oracle) prediction. The presence of images that flip Oracle predictions and those that do not makes this a challenging setting for adversarial robustness. We discuss the ideal goals of an adversarial defense algorithm beyond perceptual limits, and further highlight the shortcomings of naively extending existing training algorithms to higher perturbation bounds. In order to overcome these shortcomings, we propose a novel defense, Oracle-Aligned Adversarial Training (OA-AT), to align the predictions of the network with that of an Oracle during adversarial training. The proposed approach achieves state-of-the-art performance at large epsilon bounds (such as an L-inf bound of 16/255 on CIFAR-10) while outperforming existing defenses (AWP, TRADES, PGD-AT) at standard bounds (8/255) as well.
翻译:深心神经网络对反对立攻击的脆弱性推动了建立强力模型的研究。虽然大多数反向培训算法都旨在捍卫在低级Lp标准界限内受到限制的攻击,但现实世界对手并不受到这种限制的限制。在这项工作中,我们的目标是在更大的界限内实现对抗性强力,防止可能可见但不会改变人类(或甲骨文)预测的扰动,防止可能可见但不会改变人类(或甲骨文)预测的扰动;存在翻转甲骨骼预测的图像,以及没有使这种预测成为对抗性强力挑战性环境的图像。我们讨论了对抗性防御算法的理想目标,超越了概念界限,并进一步强调了天真地将现有培训算法扩展至更高扰动性界限的缺点。为了克服这些缺点,我们提议一种新型的防御,即甲骨文(Oracle-Al-AAT),使网络的预测与在对抗性训练期间的甲骨文(Oracle)的预测保持一致。拟议方法在大型环丝框(例如16/255的Linf约束,在15-TRA(IGD)现有防御,同时运行为10。