A key challenge in adversarial robustness is the lack of a precise mathematical characterization of human perception, used in the very definition of adversarial attacks that are imperceptible to human eyes. Most current attacks and defenses try to avoid this issue by considering restrictive adversarial threat models such as those bounded by $L_2$ or $L_\infty$ distance, spatial perturbations, etc. However, models that are robust against any of these restrictive threat models are still fragile against other threat models. To resolve this issue, we propose adversarial training against the set of all imperceptible adversarial examples, approximated using deep neural networks. We call this threat model the neural perceptual threat model (NPTM); it includes adversarial examples with a bounded neural perceptual distance (a neural network-based approximation of the true perceptual distance) to natural images. Through an extensive perceptual study, we show that the neural perceptual distance correlates well with human judgements of perceptibility of adversarial examples, validating our threat model. Under the NPTM, we develop novel perceptual adversarial attacks and defenses. Because the NPTM is very broad, we find that Perceptual Adversarial Training (PAT) against a perceptual attack gives robustness against many other types of adversarial attacks. We test PAT on CIFAR-10 and ImageNet-100 against five diverse adversarial attacks. We find that PAT achieves state-of-the-art robustness against the union of these five attacks, more than doubling the accuracy over the next best model, without training against any of them. That is, PAT generalizes well to unforeseen perturbation types. This is vital in sensitive applications where a particular threat model cannot be assumed, and to the best of our knowledge, PAT is the first adversarial training defense with this property.
翻译:对抗性强力中的一个关键挑战是缺乏精确的数学特征描述人类感知,而这种特征正是在对敌对性攻击的定义中被人类眼中无法察觉的。目前大多数攻击和防御都试图通过考虑限制性的对抗性威胁模式来避免这一问题,例如由L_2美元或$L ⁇ infty$的距离、空间扰动等约束的对抗性威胁模式。然而,针对任何这些限制性威胁模式的强大模型仍然对其他威胁模式十分脆弱。为了解决这一问题,我们建议针对所有不可察觉的对抗性例子进行对抗性训练,这大约是使用深厚的神经网络网络网络。我们把这种威胁模型称为神经性威胁模型(NPT-ATM);它包括具有约束性视觉距离的对抗性对抗性对抗性威胁模式(以神经性网络为基础,真实感知性距离接近真实性距离)的对抗自然图像。通过广泛的认知性研究,我们发现,神经性模型的认知性距离与人类对敌对性例子的认知性判断性判断是完全吻合的。在《不扩散条约》的下,我们发展了一种新型的对抗性攻击的激烈性攻击性攻击性攻击,而我们则是对《不扩散条约》的侵略性攻击性攻击的激烈性攻击性攻击的激烈性攻击性攻击。