We propose a stealthy and powerful backdoor attack on neural networks based on data poisoning (DP). In contrast to previous attacks, both the poison and the trigger in our method are stealthy. We are able to change the model's classification of samples from a source class to a target class chosen by the attacker. We do so by using a small number of poisoned training samples with nearly imperceptible perturbations, without changing their labels. At inference time, we use a stealthy perturbation added to the attacked samples as a trigger. This perturbation is crafted as a universal adversarial perturbation (UAP), and the poison is crafted using gradient alignment coupled to this trigger. Our method is highly efficient in crafting time compared to previous methods and requires only a trained surrogate model without additional retraining. Our attack achieves state-of-the-art results in terms of attack success rate while maintaining high accuracy on clean samples.
翻译:根据数据中毒(DP),我们提议对神经网络进行隐蔽和强大的后门攻击。与以前的攻击相比,我们的方法中的毒物和触发器都是隐形的。我们能够将模型样本的分类从源类改变为攻击者选择的目标类。我们这样做的方法是使用少量几乎可以不受干扰的有毒训练样品,不改变标签,不改变标签,使用少量几乎可以不受干扰的干扰。推断时间,我们使用在被攻击样品中添加的隐形扰动作为触发器。这种扰动是作为通用的对抗性扰动(UAP)来制造的,而毒剂是用梯度对齐配合这一触发器来制造的。我们的方法与以前的方法相比,在制作时间时效率很高,只需要经过训练的代孕模型,而不需要额外的再培训。我们的攻击在攻击成功率方面达到最新水平,同时保持干净样品的高度精确性。