In this work, we show how to jointly exploit adversarial perturbation and model poisoning vulnerabilities to practically launch a new stealthy attack, dubbed AdvTrojan. AdvTrojan is stealthy because it can be activated only when: 1) a carefully crafted adversarial perturbation is injected into the input examples during inference, and 2) a Trojan backdoor is implanted during the training process of the model. We leverage adversarial noise in the input space to move Trojan-infected examples across the model decision boundary, making it difficult to detect. The stealthiness behavior of AdvTrojan fools the users into accidentally trust the infected model as a robust classifier against adversarial examples. AdvTrojan can be implemented by only poisoning the training data similar to conventional Trojan backdoor attacks. Our thorough analysis and extensive experiments on several benchmark datasets show that AdvTrojan can bypass existing defenses with a success rate close to 100% in most of our experimental scenarios and can be extended to attack federated learning tasks as well.
翻译:在这项工作中,我们展示了如何联合利用对抗性扰动和模型中毒脆弱性,以实际发动新的隐形攻击,称为Adbbed AdvTrojan。AdvTrojan是隐形的,因为它只有在以下情况下才能被激活:1)在推断过程中将精心设计的对抗性扰动注入输入实例中,2)在模型培训过程中安装了Trojan后门。我们利用输入空间中的对抗性噪声将Trojan感染的例子移过示范决定边界,从而难以察觉。AdvTrojan的隐形行为使用户意外地信任被感染的模型作为有力的分类器,以对抗对抗对抗对抗敌对性例子。AdvTrojan只能通过毒化类似于传统Trojan后门攻击的培训数据来实施。我们对几个基准数据集的透彻分析和广泛实验表明,AdvTrojan在多数实验情景中可以绕过现有的防御,成功率接近100%,并且可以扩展到攻击进化学习任务。