Given the increasing threat of adversarial attacks on deep neural networks (DNNs), research on efficient detection methods is more important than ever. In this work, we take a closer look at adversarial attack detection based on the class scores of an already trained classification model. We propose to train a support vector machine (SVM) on the class scores to detect adversarial examples. Our method is able to detect adversarial examples generated by various attacks, and can be easily adopted to a plethora of deep classification models. We show that our approach yields an improved detection rate compared to an existing method, whilst being easy to implement. We perform an extensive empirical analysis on different deep classification models, investigating various state-of-the-art adversarial attacks. Moreover, we observe that our proposed method is better at detecting a combination of adversarial attacks. This work indicates the potential of detecting various adversarial attacks simply by using the class scores of an already trained classification model.
翻译:鉴于对深层神经网络的对抗性攻击威胁日益增加,关于高效检测方法的研究比以往更加重要。在这项工作中,我们更仔细地研究基于已经受过训练的分类模型的分数的对抗性攻击检测。我们提议在等级分数上培训一个辅助矢量机(SVM)来检测对抗性攻击实例。我们的方法能够检测各种攻击产生的对抗性例子,并且很容易被采用到过多的深层次分类模型中。我们表明,我们的方法比现有方法提高了检测率,同时容易执行。我们对不同的深入分类模型进行了广泛的实证分析,调查各种最新的对抗性攻击。此外,我们发现我们提议的方法是更好地检测各种对抗性攻击的组合。这项工作表明,只要使用经过训练的分类模型的分数,就能发现各种对抗性攻击的可能性。