We have witnessed the continuing arms race between backdoor attacks and the corresponding defense strategies on Deep Neural Networks (DNNs). Most state-of-the-art defenses rely on the statistical sanitization of the "inputs" or "latent DNN representations" to capture trojan behaviour. In this paper, we first challenge the robustness of such recently reported defenses by introducing a novel variant of targeted backdoor attack, called "low-confidence backdoor attack". We also propose a novel defense technique, called "HaS-Nets". "Low-confidence backdoor attack" exploits the confidence labels assigned to poisoned training samples by giving low values to hide their presence from the defender, both during training and inference. We evaluate the attack against four state-of-the-art defense methods, viz., STRIP, Gradient-Shaping, Februus and ULP-defense, and achieve Attack Success Rate (ASR) of 99%, 63.73%, 91.2% and 80%, respectively. We next present "HaS-Nets" to resist backdoor insertion in the network during training, using a reasonably small healing dataset, approximately 2% to 15% of full training data, to heal the network at each iteration. We evaluate it for different datasets - Fashion-MNIST, CIFAR-10, Consumer Complaint and Urban Sound - and network architectures - MLPs, 2D-CNNs, 1D-CNNs. Our experiments show that "HaS-Nets" can decrease ASRs from over 90% to less than 15%, independent of the dataset, attack configuration and network architecture.
翻译:我们亲眼目睹了深神经网络(DNN)内部攻击和相应的防御战略之间的持续军备竞赛。 多数最先进的防御手段依靠“ 投入” 或“ 相对的 DNN 代表” 的统计净化,以捕捉Trojan行为。 在本文中,我们首先通过引入名为“ 低信任幕后攻击”的新型的幕后攻击变体,挑战最近报告的这种防御手段的稳健性。 我们还提出了一个叫作“ 低信任幕后攻击”的新型防御技术。 “ 低信任幕后攻击”利用了指定给有毒训练样品的保密标签,在培训和推断期间向防御者提供了低值的“ 投入” 。 我们评估了对四种最先进的防御方法的袭击,例如STIP、 Gradient-Shaping、F2ruus和ULP-防御, 以及达到99%、63MM-73%、91.2%和80%的进攻成功率(ASR ) 。 我们接下来的“ 网络-CN-NNNW-N-Net ” 以低值来抵制网络的后方插入网络,在大约15个网络的数据中,在培训中,在1个不同的网络中,在1个网络中,在1个网络中,在1个内部数据中, 显示1个网络中,在1个数字结构中,在1个网络中,在进行合理的数据中,在1个内部数据中,在1个网络中显示1个数据中,在1个数据中,在1个中,在1个数据中,在1个中,在1个网络中,在1个网络中,在1个中,在1个中,在进行合理的数据中,在1个中,在1个中,在1个中,在1个中,在培训中,在1个中,在进行1个中,在1个中,在1个中,在进行。