Despite the enormous performance of deepneural networks (DNNs), recent studies have shown theirvulnerability to adversarial examples (AEs), i.e., care-fully perturbed inputs designed to fool the targetedDNN. Currently, the literature is rich with many ef-fective attacks to craft such AEs. Meanwhile, many de-fenses strategies have been developed to mitigate thisvulnerability. However, these latter showed their effec-tiveness against specific attacks and does not general-ize well to different attacks. In this paper, we proposea framework for defending DNN classifier against ad-versarial samples. The proposed method is based on atwo-stage framework involving a separate detector anda denoising block. The detector aims to detect AEs bycharacterizing them through the use of natural scenestatistic (NSS), where we demonstrate that these statis-tical features are altered by the presence of adversarialperturbations. The denoiser is based on block matching3D (BM3D) filter fed by an optimum threshold valueestimated by a convolutional neural network (CNN) toproject back the samples detected as AEs into theirdata manifold. We conducted a complete evaluation onthree standard datasets namely MNIST, CIFAR-10 andTiny-ImageNet. The experimental results show that theproposed defense method outperforms the state-of-the-art defense techniques by improving the robustnessagainst a set of attacks under black-box, gray-box and white-box settings. The source code is available at: https://github.com/kherchouche-anouar/2DAE
翻译:尽管深海神经网络(DNNS)表现巨大,但最近的研究表明,它们对于对抗性实例(AEs)来说具有很强的内向性,即:为欺骗目标DNN而精心的周旋投入,旨在欺骗目标DNNN。目前,文献中有许多内容丰富,为设计这种内向性而进行了许多不适攻击。与此同时,已经制定了许多防罪战略,以缓解这种脆弱性。然而,这些战略显示了它们对于具体攻击的偏向性,并且没有将不同攻击普遍化。在本文中,我们提出了一个保护DNNNG分类器以对抗反向性样本(AE)的防腐蚀性输入框架。拟议的方法以两阶段框架为基础,包括一个单独的探测器和一个解析区框架。检测者的目的是通过使用自然地貌统计(NSS)来探测AEE。我们证明这些统计特征因存在对抗性攻击而改变。在白正向性军事数据库(BM3D)下,用最优的内位值来根据一个最优的内向的内位值来对内位的内向内向内位的内位的内位的内位的内向值进行校-内建数据评估。