While having achieved great success in rich real-life applications, deep neural network (DNN) models have long been criticized for their vulnerability to adversarial attacks. Tremendous research efforts have been dedicated to mitigating the threats of adversarial attacks, but the essential trait of adversarial examples is not yet clear, and most existing methods are yet vulnerable to hybrid attacks and suffer from counterattacks. In light of this, in this paper, we first reveal a gradient-based correlation between sensitivity analysis-based DNN interpreters and the generation process of adversarial examples, which indicates the Achilles's heel of adversarial attacks and sheds light on linking together the two long-standing challenges of DNN: fragility and unexplainability. We then propose an interpreter-based ensemble framework called X-Ensemble for robust adversary defense. X-Ensemble adopts a novel detection-rectification process and features in building multiple sub-detectors and a rectifier upon various types of interpretation information toward target classifiers. Moreover, X-Ensemble employs the Random Forests (RF) model to combine sub-detectors into an ensemble detector for adversarial hybrid attacks defense. The non-differentiable property of RF further makes it a precious choice against the counterattack of adversaries. Extensive experiments under various types of state-of-the-art attacks and diverse attack scenarios demonstrate the advantages of X-Ensemble to competitive baseline methods.
翻译:翻译摘要:
深度神经网络(DNN)在富实际应用中取得了巨大成功,但长期以来它们一直被批评对对手攻击容易受攻击。目前已经为减轻对手攻击的威胁进行了大量的研究,但对手示例的本质特征尚不清楚,并且大多数现有的方法仍然容易受到混合攻击并遭受对手的反击。因此,在本文中,我们首先揭示了基于灵敏度分析的DNN解释器与对手示例的生成过程之间的梯度相关性,这表明了对手攻击的弱点,同时启示了关联DNN的两个长期存在的挑战:易碎性和难以解释性。然后,我们提出了一种基于解释器的集成框架X-Ensemble,在面对对手攻击时实现坚实的防御。X-Ensemble采用一种新颖的检测-矫正过程,并具有构建多个子检测器和矫正器以及针对各种类型的目标分类器的各种解释信息的特点。此外,X-Ensemble采用Random Forests(RF)模型将子检测器组合成用于对对手混合攻击的集成检测器。RF的非可区分属性进一步使其成为应对对手反击的宝贵选择。在各种类型的最新攻击和不同类型的攻击情况下的广泛实验显示了X-Ensemble相对于竞争基线方法的优势。