Security-sensitive applications that relay on Deep Neural Networks (DNNs) are vulnerable to small perturbations crafted to generate Adversarial Examples (AEs) that are imperceptible to human and cause DNN to misclassify them. Many defense and detection techniques have been proposed. The state-of-the-art detection techniques have been designed for specific attacks or broken by others, need knowledge about the attacks, are not consistent, increase model parameters overhead, are time-consuming, or have latency in inference time. To trade off these factors, we propose a novel unsupervised detection mechanism that uses the selective prediction, processing model layers outputs, and knowledge transfer concepts in a multi-task learning setting. It is called Selective and Feature based Adversarial Detection (SFAD). Experimental results show that the proposed approach achieves comparable results to the state-of-the-art methods against tested attacks in white box scenario and better results in black and gray boxes scenarios. Moreover, results show that SFAD is fully robust against High Confidence Attacks (HCAs) for MNIST and partially robust for CIFAR-10 datasets.
翻译:在深神经网络(DNN)上转发的安全敏感应用很容易受到小扰动的干扰,这些扰动是为了产生人类无法察觉的反向实例,并导致DNN错误分类。提出了许多防御和探测技术。最先进的探测技术是为特定攻击设计或被他人破坏的,需要关于攻击的知识,不连贯,增加模型参数管理,耗时,或有延迟的发酵时间。为了交换这些因素,我们提议建立一个新的、不受监督的检测机制,在多任务学习环境中使用选择性预测、处理模型层输出和知识转移概念,称为基于选择性和地貌的反向探测(SFAD)。实验结果显示,拟议的方法取得了与在白箱情景下试验式攻击的先进方法相当的结果,在黑箱和灰箱情景下取得更好的结果。此外,结果显示,SFAD对于MMIST来说,完全可以抵御高度信任性攻击(HA),对于CIFAR 10数据集来说,部分是可靠的。