We present DeClaW, a system for detecting, classifying, and warning of adversarial inputs presented to a classification neural network. In contrast to current state-of-the-art methods that, given an input, detect whether an input is clean or adversarial, we aim to also identify the types of adversarial attack (e.g., PGD, Carlini-Wagner or clean). To achieve this, we extract statistical profiles, which we term as anomaly feature vectors, from a set of latent features. Preliminary findings suggest that AFVs can help distinguish among several types of adversarial attacks (e.g., PGD versus Carlini-Wagner) with close to 93% accuracy on the CIFAR-10 dataset. The results open the door to using AFV-based methods for exploring not only adversarial attack detection but also classification of the attack type and then design of attack-specific mitigation strategies.
翻译:我们提出DeClaW, 这是一种用于检测、分类和警告向神经分类网络提供的对抗性投入的系统; 与目前的先进方法相反,根据输入,我们旨在检测输入是否干净或对抗性,我们还旨在确定对抗性攻击的类型(如PGD、Carlini-Wagner或清洁); 为了实现这一目标,我们从一系列潜在特征中提取统计概况,我们将其称为异常特性矢量; 初步调查结果表明,AFV可以帮助区分几类对抗性攻击(如PGD对Carlini-Wagner),CIFAR-10数据集的精确度接近93%; 结果打开了大门,可以使用AFV为基础的方法,不仅探索对抗性攻击探测,而且还对攻击类型进行分类,然后设计针对攻击的缓解战略。