Recent studies revealed that deep learning is susceptible to backdoor poisoning attacks. An adversary can embed a hidden backdoor into a model to manipulate its predictions by only modifying a few training data, without controlling the training process. Currently, a tangible signature has been widely observed across a diverse set of backdoor poisoning attacks -- models trained on a poisoned dataset tend to learn separable latent representations for poison and clean samples. This latent separation is so pervasive that a family of backdoor defenses directly take it as a default assumption (dubbed latent separability assumption), based on which to identify poison samples via cluster analysis in the latent space. An intriguing question consequently follows: is the latent separation unavoidable for backdoor poisoning attacks? This question is central to understanding whether the assumption of latent separability provides a reliable foundation for defending against backdoor poisoning attacks. In this paper, we design adaptive backdoor poisoning attacks to present counter-examples against this assumption. Our methods include two key components: (1) a set of trigger-planted samples correctly labeled to their semantic classes (other than the target class) that can regularize backdoor learning; (2) asymmetric trigger planting strategies that help to boost attack success rate (ASR) as well as to diversify latent representations of poison samples. Extensive experiments on benchmark datasets verify the effectiveness of our adaptive attacks in bypassing existing latent separation based backdoor defenses. Moreover, our attacks still maintain a high attack success rate with negligible clean accuracy drop. Our studies call for defense designers to take caution when leveraging latent separation as an assumption in their defenses.
翻译:最近的研究表明,深层次的学习很容易成为幕后中毒袭击。 对手可以将隐藏的后门嵌入一个模型,通过修改少数培训数据来操纵其预测,而不必控制培训过程。 目前,在一系列不同的幕后中毒袭击中广泛观察到了明显的信号。 在有毒的数据集上培训的模型往往会发现毒物和清洁样品的隐含潜在潜在表现。 这种潜在的分离非常普遍,以至于一个幕后防御体系直接把它当作一种默认假设( 深潜潜潜的隐性分离假设), 以此为基础通过潜在空间的集束分析来识别毒药样本。 因此,一个令人感兴趣的问题如下: 幕后中毒袭击是不可避免的潜在分离吗? 这个问题对于了解潜在分裂性假设是否为防范幕后中毒袭击提供了可靠的基础是十分重要的。 在本文中,我们设计适应性后门毒物攻击攻击是反表的。 我们的方法包括两个关键组成部分:(1) 一组触发的样品,正确标记到其语义等级( 目标类除外),可以使幕后袭击的精确性样本学习; (2) 重新定义的隐含的隐含的隐含的隐含的隐含的隐性策略战略,作为我们攻击的潜伏的潜伏的实验性研究。 帮助, 推进的实验,作为我们攻击的潜伏的实验的模型的基的模型的实验, 以推进的潜伏性研究。