With extensive studies on backdoor attack and detection, still fundamental questions are left unanswered regarding the limits in the adversary's capability to attack and the defender's capability to detect. We believe that answers to these questions can be found through an in-depth understanding of the relations between the primary task that a benign model is supposed to accomplish and the backdoor task that a backdoored model actually performs. For this purpose, we leverage similarity metrics in multi-task learning to formally define the backdoor distance (similarity) between the primary task and the backdoor task, and analyze existing stealthy backdoor attacks, revealing that most of them fail to effectively reduce the backdoor distance and even for those that do, still much room is left to further improve their stealthiness. So we further design a new method, called TSA attack, to automatically generate a backdoor model under a given distance constraint, and demonstrate that our new attack indeed outperforms existing attacks, making a step closer to understanding the attacker's limits. Most importantly, we provide both theoretic results and experimental evidence on various datasets for the positive correlation between the backdoor distance and backdoor detectability, demonstrating that indeed our task similarity analysis help us better understand backdoor risks and has the potential to identify more effective mitigations.
翻译:在对后门攻击和探测进行广泛研究后门攻击和探测后门攻击的广泛研究后,对于对手攻击能力的限制和辩护人的探测能力,仍然有一些根本性的问题没有得到回答。我们认为,这些问题的答案可以通过深入了解良性模型应该完成的首要任务与后门模型实际完成的后门任务之间的关系来找到。为此,我们在多任务学习中利用相似的尺度来正式界定主任务和后门任务之间的后门距离(相似性),分析现有的隐性后门攻击,揭示它们大多未能有效减少后门距离,甚至对于确实这样做的人来说,还剩下很多空间来进一步改进它们的隐形性。因此,我们进一步设计了一种新的方法,称为TSA攻击,在一定的距离限制下自动生成后门模型,并表明我们的新攻击的确比现有的攻击更接近于现有的攻击,更接近于理解攻击者的极限。最重要的是,我们为后门距离和后门之间积极的相关性提供了理论结果和实验性证据。我们更能帮助我们更好地认识后门任务和后门之间分析的准确性,确实证明我们更能帮助了解后门任务。