A key challenge in adversarial robustness is the lack of a precise mathematical characterization of human perception, used in the very definition of adversarial attacks that are imperceptible to human eyes. Most current attacks and defenses try to avoid this issue by considering restrictive adversarial threat models such as those bounded by $L_2$ or $L_\infty$ distance, spatial perturbations, etc. However, models that are robust against any of these restrictive threat models are still fragile against other threat models. To resolve this issue, we propose adversarial training against the set of all imperceptible adversarial examples, approximated using deep neural networks. We call this threat model the neural perceptual threat model (NPTM); it includes adversarial examples with a bounded neural perceptual distance (a neural network-based approximation of the true perceptual distance) to natural images. Through an extensive perceptual study, we show that the neural perceptual distance correlates well with human judgements of perceptibility of adversarial examples, validating our threat model. Under the NPTM, we develop novel perceptual adversarial attacks and defenses. Because the NPTM is very broad, we find that Perceptual Adversarial Training (PAT) against a perceptual attack gives robustness against many other types of adversarial attacks. We test PAT on CIFAR-10 and ImageNet-100 against five diverse adversarial attacks. We find that PAT achieves state-of-the-art robustness against the union of these five attacks, more than doubling the accuracy over the next best model, without training against any of them. That is, PAT generalizes well to unforeseen perturbation types. This is vital in sensitive applications where a particular threat model cannot be assumed, and to the best of our knowledge, PAT is the first adversarial defense with this property.
翻译:对抗性强力方面的一个关键挑战是缺乏精确的数学特征描述人类感知,这是在对人类眼中无法察觉的对抗性攻击定义中使用的。目前大多数攻击和防御都试图避免这一问题,办法是考虑限制性的对抗性威胁模式,例如由美元=2美元或美元=infty美元距离、空间扰动等约束的对抗性威胁模式。然而,针对任何这些限制性威胁模式的强大模型仍然对其他威胁模式十分脆弱。为了解决这一问题,我们建议针对所有不可察觉的对抗性例子进行对抗性训练,这些例子大致使用深神经神经网络网络网络。我们把这种威胁模型称为神经感知性威胁模式(NPT-ATM);它包括具有与自然图像有界限的视觉距离的对抗性对抗性威胁模式(以神经网络为基础,真正感知性距离接近真实感知性距离)的对抗性威胁模式。通过广泛的认知性研究,我们发现,最感知性的距离与人类对敌对性例子的认知性判断性判断是完全的。在《不扩散条约》下,我们发展了新式的对《AT-AT-AT》的防御性攻击的防御性攻击性攻击,而我们更感知性攻击性攻击性攻击性攻击性攻击是最强的。