EAD:从深神经网络隐藏的特征中发现对抗性实例的共合方法 (EAD: an ensemble approach to detect adversarial examples from the hidden features of deep neural networks)

One of the key challenges in Deep Learning is the definition of effective strategies for the detection of adversarial examples. To this end, we propose a novel approach named Ensemble Adversarial Detector (EAD) for the identification of adversarial examples, in a standard multiclass classification scenario. EAD combines multiple detectors that exploit distinct properties of the input instances in the internal representation of a pre-trained Deep Neural Network (DNN). Specifically, EAD integrates the state-of-the-art detectors based on Mahalanobis distance and on Local Intrinsic Dimensionality (LID) with a newly introduced method based on One-class Support Vector Machines (OSVMs). Although all constituting methods assume that the greater the distance of a test instance from the set of correctly classified training instances, the higher its probability to be an adversarial example, they differ in the way such distance is computed. In order to exploit the effectiveness of the different methods in capturing distinct properties of data distributions and, accordingly, efficiently tackle the trade-off between generalization and overfitting, EAD employs detector-specific distance scores as features of a logistic regression classifier, after independent hyperparameters optimization. We evaluated the EAD approach on distinct datasets (CIFAR-10, CIFAR-100 and SVHN) and models (ResNet and DenseNet) and with regard to four adversarial attacks (FGSM, BIM, DeepFool and CW), also by comparing with competing approaches. Overall, we show that EAD achieves the best AUROC and AUPR in the large majority of the settings and comparable performance in the others. The improvement over the state-of-the-art, and the possibility to easily extend EAD to include any arbitrary set of detectors, pave the way to a widespread adoption of ensemble approaches in the broad field of adversarial example detection.

翻译：深层学习的关键挑战之一是确定检测对抗性实例的有效战略。为此,我们提议采用名为Ensemble Adversarial 检测器(EAD)的新方法,在标准的多级分类假设中识别对抗性实例。EDA结合了多种检测器,这些检测器利用了事先培训的深神经网络内部代表系统(DNN)输入实例的不同特性。具体地说,EDA结合了基于Mahalanobis距离和地方Intrinsicial(LID)的先进检测器,采用了基于单级支持Vector机(OSVMS)的新采用的方法。尽管所有方法都假定测试器距离比一套正确分类的培训场越远,其可能性越大,这种距离的计算方式也不同。为了利用不同方法获取数据分布的特性,因此,在通用和超高端、高端、高端、高端、高端、高端、高端、高端、高端、高端、高端、高端、高端、高端、高端、高端、高端、高端、高端、高端、高端、高端、高端、高端、高端、高端、高端、高端、高端、高端、高端、高端、高端、高端、高端、高端、高端、高端、高端、高端、高端、高端、高端、高端、高端、高端、高端、高端、高端、高端、高端、高端、高端、高端、高端、高端、高端、高端、高端、高端、高端、高端、高端、高端、高端、高端、高端、高端、高端、高端、高端、高端、高端、高端、高端、高端、高端、高端、高端、高端、高端、高端、高端、高端、高端、高端、高端、高端、高端、高端、高端、高端、高端、高端、高端、高端、高端、高端、高端、高端、高端、高端、高端、高端、高端、高端、高端、高端、高端、