对对反对力力的统一游戏理论解释 (A Unified Game-Theoretic Interpretation of Adversarial Robustness)

This paper provides a unified view to explain different adversarial attacks and defense methods, \emph{i.e.} the view of multi-order interactions between input variables of DNNs. Based on the multi-order interaction, we discover that adversarial attacks mainly affect high-order interactions to fool the DNN. Furthermore, we find that the robustness of adversarially trained DNNs comes from category-specific low-order interactions. Our findings provide a potential method to unify adversarial perturbations and robustness, which can explain the existing defense methods in a principle way. Besides, our findings also make a revision of previous inaccurate understanding of the shape bias of adversarially learned features.

翻译：本文提供了一个统一的观点来解释不同的对抗性攻击和防御方法, 即 DNN 输入变量之间的多级互动观点。基于多级互动, 我们发现对抗性攻击主要影响高级互动以愚弄 DNN 。此外, 我们发现, 敌对性训练的DNN 的稳健性来自特定类别的低级互动。我们的发现为统一对抗性干扰和稳健性提供了一种潜在的方法, 可以以原则方式解释现有的防御方法。此外, 我们的发现还修正了以前对敌对性学习特征的形状偏差的不准确理解。