The origin of adversarial examples is still inexplicable in research fields, and it arouses arguments from various viewpoints, albeit comprehensive investigations. In this paper, we propose a way of delving into the unexpected vulnerability in adversarially trained networks from a causal perspective, namely adversarial instrumental variable (IV) regression. By deploying it, we estimate the causal relation of adversarial prediction under an unbiased environment dissociated from unknown confounders. Our approach aims to demystify inherent causal features on adversarial examples by leveraging a zero-sum optimization game between a casual feature estimator (i.e., hypothesis model) and worst-case counterfactuals (i.e., test function) disturbing to find causal features. Through extensive analyses, we demonstrate that the estimated causal features are highly related to the correct prediction for adversarial robustness, and the counterfactuals exhibit extreme features significantly deviating from the correct prediction. In addition, we present how to effectively inoculate CAusal FEatures (CAFE) into defense networks for improving adversarial robustness.
翻译:在研究领域,对抗性实例的起源仍然无法解释,它从各种角度引起了争论,尽管是全面的调查。在本文件中,我们建议从因果角度,即对抗性工具变量(IV)回归的角度,探讨敌对性训练网络中意外的脆弱性。我们通过部署它,估计敌对性预测在与未知混淆者无关的公正环境中的因果关系。我们的方法旨在通过利用临时地物估量器(即假设模型)和最坏情况的反事实(即测试功能)之间的零和优化游戏,破除对敌对性实例的内在因果关系特征。通过广泛的分析,我们证明估计的因果关系与对对抗性强力的正确预测密切相关,反事实显示出与正确预测截然不同的极端特征。此外,我们介绍了如何有效地将CAusal Fetars(CAFE)纳入防御网络,以改善对抗性强力。</s>