Adversarial training is the de facto most promising defense against adversarial examples. Yet, its passive nature inevitably prevents it from being immune to unknown attackers. To achieve a proactive defense, we need a more fundamental understanding of adversarial examples, beyond the popular bounded threat model. In this paper, we provide a causal viewpoint of adversarial vulnerability: the cause is the confounder ubiquitously existing in learning, where attackers are precisely exploiting the confounding effect. Therefore, a fundamental solution for adversarial robustness is causal intervention. As the confounder is unobserved in general, we propose to use the instrumental variable that achieves intervention without the need for confounder observation. We term our robust training method as Causal intervention by instrumental Variable (CiiV). It has a differentiable retinotopic sampling layer and a consistency loss, which is stable and guaranteed not to suffer from gradient obfuscation. Extensive experiments on a wide spectrum of attackers and settings applied in MNIST, CIFAR-10, and mini-ImageNet datasets empirically demonstrate that CiiV is robust to adaptive attacks.
翻译:对抗性培训是事实上最有希望的对抗性例子的防御。然而,它的被动性质不可避免地阻止它不受未知攻击者的影响。为了实现积极主动的防御,我们需要更根本地理解对抗性例子,超越受欢迎的相互交织的威胁模式。在本文中,我们提供了对抗性脆弱性的因果关系观点:原因是在学习中普遍存在的混乱者,攻击者正是在利用混杂效应。因此,对抗性强力的基本解决办法是因果干预。由于混乱者一般没有观察到,我们提议使用工具变量实现干预,而不需要混淆者观察。我们将我们强健的培训方法称为工具变量(CiiV)的Causal干预。它具有一种可区分的共性取样层和一致性损失,它稳定且不会受到梯度模糊的影响。在MNIST、CIFAR-10和微型网络应用的广泛攻击者和背景上的广泛实验,从经验上证明CiiV对适应性攻击是强大的。