In this paper, we propose a novel host-free Trojan attack with triggers that are fixed in the semantic space but not necessarily in the pixel space. In contrast to existing Trojan attacks which use clean input images as hosts to carry small, meaningless trigger patterns, our attack considers triggers as full-sized images belonging to a semantically meaningful object class. Since in our attack, the backdoored classifier is encouraged to memorize the abstract semantics of the trigger images than any specific fixed pattern, it can be later triggered by semantically similar but different looking images. This makes our attack more practical to be applied in the real-world and harder to defend against. Extensive experimental results demonstrate that with only a small number of Trojan patterns for training, our attack can generalize well to new patterns of the same Trojan class and can bypass state-of-the-art defense methods.
翻译:在本文中,我们提出一个新的无主机的Trojan攻击,其触发器固定在语义空间,但不一定固定在像素空间。与现有的Trojan攻击相比,我们的攻击把清洁输入图像用作主机,以携带小型、无意义的触发模式,我们的攻击把触发器视为属于一个具有语义意义的物体级的完整图像。 在我们的攻击中,后门分类器被鼓励将触发图像的抽象语义比任何特定的固定模式都记住,它可能后来由语义相似但不同的视觉图像触发。这使得我们的攻击更实际地应用于现实世界,更难防御。广泛的实验结果表明,只有少量的Trojan模式用于训练,我们的攻击就可以将同一Trojan类的新模式概括化,并可以绕过最先进的防御方法。