The backdoor attack poses a new security threat to deep neural networks. Existing backdoor often relies on visible universal trigger to make the backdoored model malfunction, which are not only usually visually suspicious to human but also catchable by mainstream countermeasures. We propose an imperceptible sample-specific backdoor that the trigger varies from sample to sample and invisible. Our trigger generation is automated through a desnoising autoencoder that is fed with delicate but pervasive features (i.e., edge patterns per images). We extensively experiment our backdoor attack on ImageNet and MS-Celeb-1M, which demonstrates stable and nearly 100% (i.e., 99.8%) attack success rate with negligible impact on the clean data accuracy of the infected model. The denoising autoeconder based trigger generator is reusable or transferable across tasks (e.g., from ImageNet to MS-Celeb-1M), whilst the trigger has high exclusiveness (i.e., a trigger generated for one sample is not applicable to another sample). Besides, our proposed backdoored model has achieved high evasiveness against mainstream backdoor defenses such as Neural Cleanse, STRIP, SentiNet and Fine-Pruning.
翻译:后门攻击对深层神经网络构成新的安全威胁。现有的后门常常依赖可见的通用触发器来制造幕后模式故障,而后门模型故障通常不仅对人有视觉上的怀疑,而且通过主流反措施也可以捕捉。我们建议一种无法察觉的样本特定的后门,触发器从抽样到抽样和隐形不等。我们的触发器生成是通过脱色自动自动自动自动自动编码器产生,该自动自动自动自动自动编码器配有微妙但普遍的特征(即每个图像的边缘模式),我们在图像网和MS-Celeb-1M上广泛试验我们的后门攻击。我们在图像网和MS-Celeb-1M上广泛试验了我们的后门攻击,这显示出稳定且近100 %(即99.8 % )的攻击成功率,对受感染模型的清洁数据准确性影响微乎其微。基于无知自控自控自控的自控触发器发生器可重复或可转让,而触发器则具有高度的排外性(即一个样本产生的触发器不适用于另一个样本)。此外,我们提议的后门型模型已经实现了对主流后门防御系统智能防御系统等系统、SenIP和Senstrual-sty。