With the rapid advancement and increased use of deep learning models in image identification, security becomes a major concern to their deployment in safety-critical systems. Since the accuracy and robustness of deep learning models are primarily attributed from the purity of the training samples, therefore the deep learning architectures are often susceptible to adversarial attacks. Adversarial attacks are often obtained by making subtle perturbations to normal images, which are mostly imperceptible to humans, but can seriously confuse the state-of-the-art machine learning models. We propose a framework, named APuDAE, leveraging Denoising AutoEncoders (DAEs) to purify these samples by using them in an adaptive way and thus improve the classification accuracy of the target classifier networks that have been attacked. We also show how using DAEs adaptively instead of using them directly, improves classification accuracy further and is more robust to the possibility of designing adaptive attacks to fool them. We demonstrate our results over MNIST, CIFAR-10, ImageNet dataset and show how our framework (APuDAE) provides comparable and in most cases better performance to the baseline methods in purifying adversaries. We also design adaptive attack specifically designed to attack our purifying model and demonstrate how our defense is robust to that.
翻译:随着在图像识别方面的快速进步和更多地使用深层次学习模型,安全成为了在安全临界系统中部署这些模型的一个主要关切。由于深层次学习模型的准确性和稳健性主要归因于培训样本的纯度,因此深层次学习结构往往容易发生对抗性攻击。反向攻击往往通过对普通图像进行微妙的干扰而获得,这些图像大多是人类无法察觉的,但可以严重混淆最先进的机器学习模型。我们提出了一个框架,名为APUDAE,利用Denoising Auto Encorders(DAE)来利用这些样本进行净化,以适应性方式使用这些样本,从而提高被攻击的目标分类网络的分类准确性。我们还展示了如何适应性地使用DAE而不是直接使用这些图像,进一步提高了分类准确性,并且对于设计适应性攻击以愚弄他们的可能性更为有力。我们展示了我们在MNIST、CIFAR-10、图像网络数据集方面的结果,并展示了我们的框架(APUDAE)如何在多数情况下提供与我们进行净化攻击的基线方法的可比和更好的业绩。我们还具体地设计了侵略性攻击的模型,以显示我们如何进行严格的防御性攻击。我们如何进行侵略性攻击。