Although deep neural networks (DNNs) have shown impressive performance on many perceptual tasks, they are vulnerable to adversarial examples that are generated by adding slight but maliciously crafted perturbations to benign images. Adversarial detection is an important technique for identifying adversarial examples before they are entered into target DNNs. Previous studies to detect adversarial examples either targeted specific attacks or required expensive computation. How design a lightweight unsupervised detector is still a challenging problem. In this paper, we propose an AutoEncoder-based Adversarial Examples (AEAE) detector, that can guard DNN models by detecting adversarial examples with low computation in an unsupervised manner. The AEAE includes only a shallow autoencoder but plays two roles. First, a well-trained autoencoder has learned the manifold of benign examples. This autoencoder can produce a large reconstruction error for adversarial images with large perturbations, so we can detect significantly perturbed adversarial examples based on the reconstruction error. Second, the autoencoder can filter out the small noise and change the DNN's prediction on adversarial examples with small perturbations. It helps to detect slightly perturbed adversarial examples based on the prediction distance. To cover these two cases, we utilize the reconstruction error and prediction distance from benign images to construct a two-tuple feature set and train an adversarial detector using the isolation forest algorithm. We show empirically that the AEAE is unsupervised and inexpensive against the most state-of-the-art attacks. Through the detection in these two cases, there is nowhere to hide adversarial examples.
翻译:虽然深心神经网络(DNNS)在许多感知任务上表现出令人印象深刻的表现,但它们很容易受到对抗性例子的影响,这些例子是通过在良性图像中添加轻微但恶意制作的触动性扰动而生成的。 反向探测是在进入目标 DNNS之前识别对抗性实例的重要技术。 先前为检测对抗性实例而进行的研究, 或者是有针对性的特定攻击, 或者是需要昂贵的计算。 设计轻量的、 不受监督的检测器是一个挑战性的问题 。 在本文中, 我们建议建立一个基于 AutoEncorder 的反向性模拟示例, 它可以通过不受监督的方式探测 DNNN的对抗性实例。 AEAE 的检测器只包括一个浅的自动编码器, 但有两个角色。 首先, 受过良好训练的自动编码器已经学习了多个良性实例。 这个自动编码可以产生一个巨大的反向性图像重建错误, 这样我们就可以根据重建错误, 大大地探测反向性相对性对立的相对性实例。 其次, 将小型的反向实验模型过滤出一个小的对低度模型,, 在一次的图像中, 图像中, 我们用两个对等的模拟的预测中, 利用两个图像中, 利用这些模拟的模拟的模拟的模拟的模拟的模拟模型来显示, 模拟的模拟的模拟的模拟的模拟的模拟的模拟的模拟的模拟的模拟的模拟的图像的模型, 。