With the rapid advancement and increased use of deep learning models in image identification, security becomes a major concern to their deployment in safety-critical systems. Since the accuracy and robustness of deep learning models are primarily attributed from the purity of the training samples, therefore the deep learning architectures are often susceptible to adversarial attacks. Adversarial attacks are often obtained by making subtle perturbations to normal images, which are mostly imperceptible to humans, but can seriously confuse the state-of-the-art machine learning models. What is so special in the slightest intelligent perturbations or noise additions over normal images that it leads to catastrophic classifications by the deep neural networks? Using statistical hypothesis testing, we find that Conditional Variational AutoEncoders (CVAE) are surprisingly good at detecting imperceptible image perturbations. In this paper, we show how CVAEs can be effectively used to detect adversarial attacks on image classification networks. We demonstrate our results over MNIST, CIFAR-10 dataset and show how our method gives comparable performance to the state-of-the-art methods in detecting adversaries while not getting confused with noisy images, where most of the existing methods falter.
翻译:在图像识别方面,随着快速进步和更多地使用深学习模型的快速进步,安全成为了对在安全临界系统中部署这些模型的重大关切。由于深学习模型的准确性和稳健性主要归因于培训样本的纯度,因此深学习结构往往容易受到对抗性攻击。反向攻击往往通过对普通图像进行微妙的扰动而获得,这些图像大多是人类无法察觉的,但可以严重混淆最先进的机器学习模型。在最智能的扰动或噪音的增加中,这些模型在普通图像上的位置特别到导致深神经网络的灾难性分类?使用统计假设测试,我们发现光致变异自动电解码器(CVAE)在探测不可察觉的图像扰动性影响方面是出乎意料的。在本文中,我们展示了CVAE如何有效地利用CVAE检测图像对图像分类网络的对抗性攻击。我们展示了我们超越MNIST、CIFAR-10数据集和显示我们的方法如何与最先进的检查对手的状态方法具有可比性,同时不与最不稳定的现有图像相混淆。