In recent years, machine learning models have been shown to be vulnerable to backdoor attacks. Under such attacks, an adversary embeds a stealthy backdoor into the trained model such that the compromised models will behave normally on clean inputs but will misclassify according to the adversary's control on maliciously constructed input with a trigger. While these existing attacks are very effective, the adversary's capability is limited: given an input, these attacks can only cause the model to misclassify toward a single pre-defined or target class. In contrast, this paper exploits a novel backdoor attack with a much more powerful payload, denoted as Marksman, where the adversary can arbitrarily choose which target class the model will misclassify given any input during inference. To achieve this goal, we propose to represent the trigger function as a class-conditional generative model and to inject the backdoor in a constrained optimization framework, where the trigger function learns to generate an optimal trigger pattern to attack any target class at will while simultaneously embedding this generative backdoor into the trained model. Given the learned trigger-generation function, during inference, the adversary can specify an arbitrary backdoor attack target class, and an appropriate trigger causing the model to classify toward this target class is created accordingly. We show empirically that the proposed framework achieves high attack performance while preserving the clean-data performance in several benchmark datasets, including MNIST, CIFAR10, GTSRB, and TinyImageNet. The proposed Marksman backdoor attack can also easily bypass existing backdoor defenses that were originally designed against backdoor attacks with a single target class. Our work takes another significant step toward understanding the extensive risks of backdoor attacks in practice.
翻译:近些年来, 机器学习模式被证明容易受到幕后攻击。 在这种攻击中, 对手在经过训练的模型中嵌入隐蔽的后门后门攻击, 这样被破坏的模型会正常地使用干净的投入, 但是会错误地按照对手对恶意制造的输入和触发器的控制进行分类。 虽然这些现有的攻击非常有效, 对手的能力是有限的: 给一个输入, 这些攻击只能导致模型错误地分类为单一的预设或目标类。 相反, 本文利用了一种新颖的后门攻击, 其功能的威力要大得多, 被标记为Marksman, 敌人可以任意地选择哪个目标类模型会因在推断过程中的任何输入错误而错误地分类。 为了实现这一目标, 我们提议将触发功能作为等级条件化模型的组合模型, 并将后门输入一个限制的优化框架, 触发功能会学会产生一个最佳的触发模式, 攻击任何目标级的后门的后门变形, 同时将这种后门变形的后门变换成一个模式。 在攻击中, 向向攻击前变换为攻击时, 目标攻击中, 敌人可以任意变一个任意的轨变一个任意的动作, 我们的底变一个自动的系统变一个工具, 性变一个直的内变一个工具可以用来的系统变一个自动的动作, 性变一个性动作的内变一个性动作的内变一个性动作, 性动作可以使一个工具, 性变一个高性动作的系统 性变一个性动作在高的动作, 性动作, 性变一个工具在高的底的底的动作框架, 性变一个实验性能的动作框架, 性能性能的底的动作框架在上显示一个实验性动作框架, 我们的动作的动作, 性变一个任意性动作的底的动作的动作, 性变一个直的动作, 性变一个直的动作, 性动作, 性动作的动作的动作在向的动作在上显示, 我们的动作在上显示一个任意性动作的动作在上显示一个高的动作的动作的动作 性动作 性动作的动作在向上显示一个任意性动作的动作的动作的动作的动作的内变的内向的动作的动作的动作的动作