Deep Learning backdoor attacks have a threat model similar to traditional cyber attacks. Attack forensics, a critical counter-measure for traditional cyber attacks, is hence of importance for defending model backdoor attacks. In this paper, we propose a novel model backdoor forensics technique. Given a few attack samples such as inputs with backdoor triggers, which may represent different types of backdoors, our technique automatically decomposes them to clean inputs and the corresponding triggers. It then clusters the triggers based on their properties to allow automatic attack categorization and summarization. Backdoor scanners can then be automatically synthesized to find other instances of the same type of backdoor in other models. Our evaluation on 2,532 pre-trained models, 10 popular attacks, and comparison with 9 baselines show that our technique is highly effective. The decomposed clean inputs and triggers closely resemble the ground truth. The synthesized scanners substantially outperform the vanilla versions of existing scanners that can hardly generalize to different kinds of attacks.
翻译:深海后门攻击具有类似于传统网络攻击的威胁模式。 攻击法证是传统网络攻击的关键对策,因此对于防御模式后门攻击非常重要。 在本文中,我们提出一个新的后门法证技术模型。 鉴于少数攻击样板,如后门触发器,可能代表不同种类的后门触发器,我们的技术会自动分解它们来清理输入物和相应的触发物。然后,根据它们的特性对触发物进行分组,以便进行自动攻击分类和总结。然后,可以自动合成后门扫描器,以在其他模式中找到同类的后门攻击物。 我们对2 532个预先训练的模型、10个大众攻击和与9个基线的比较表明我们的技术非常有效。 分解干净的输入物和触发物与地面真相非常相似。 合成的扫描仪大大超出现有扫描器的香草版本,而这些扫描器几乎无法概括到不同种类的攻击。