Recently, backdoor attacks pose a new security threat to the training process of deep neural networks (DNNs). Attackers intend to inject hidden backdoors into DNNs, such that the attacked model performs well on benign samples, whereas its prediction will be maliciously changed if hidden backdoors are activated by the attacker-defined trigger. Existing backdoor attacks usually adopt the setting that triggers are sample-agnostic, $i.e.,$ different poisoned samples contain the same trigger, resulting in that the attacks could be easily mitigated by current backdoor defenses. In this work, we explore a novel attack paradigm, where backdoor triggers are sample-specific. In our attack, we only need to modify certain training samples with invisible perturbation, while not need to manipulate other training components ($e.g.$, training loss, and model structure) as required in many existing attacks. Specifically, inspired by the recent advance in DNN-based image steganography, we generate sample-specific invisible additive noises as backdoor triggers by encoding an attacker-specified string into benign images through an encoder-decoder network. The mapping from the string to the target label will be generated when DNNs are trained on the poisoned dataset. Extensive experiments on benchmark datasets verify the effectiveness of our method in attacking models with or without defenses.
翻译:最近,后门攻击对深神经网络(DNNS)的培训过程构成新的安全威胁。攻击者打算将隐藏的后门输入DNS,这样攻击模型在良性样本中表现良好,而如果隐蔽的后门受到攻击者定义的触发,则其预测会恶意改变。现有的后门攻击通常采用触发器为样本不可知性的设定,即$.e.,不同的有毒样品含有相同的触发器,导致攻击很容易通过当前的后门防御来减轻。在这项工作中,我们探索了一种新的攻击模式,即后门触发器具有样本特性。在我们的攻击中,我们只需要修改某些具有隐性扰动性的培训样本,而不需要按照许多现有攻击中的要求操作其他培训部件($、培训损失和模型结构)。具体地说,在以 DNNNS 为基础的图像扫描器最新进展的启发下,我们通过将攻击者指定的字符串输入为良性图像,然后通过一个编码化的编码-decoder网络来进行修改,而不需要操作其他训练的测试部件。在数据库中绘制数据标签时,将数据定位系统进行。