RAP: 防范对国家劳工计划模型的后门攻击的有力-软件干扰 (RAP: Robustness-Aware Perturbations for Defending against Backdoor Attacks on NLP Models)

Backdoor attacks, which maliciously control a well-trained model's outputs of the instances with specific triggers, are recently shown to be serious threats to the safety of reusing deep neural networks (DNNs). In this work, we propose an efficient online defense mechanism based on robustness-aware perturbations. Specifically, by analyzing the backdoor training process, we point out that there exists a big gap of robustness between poisoned and clean samples. Motivated by this observation, we construct a word-based robustness-aware perturbation to distinguish poisoned samples from clean samples to defend against the backdoor attacks on natural language processing (NLP) models. Moreover, we give a theoretical analysis about the feasibility of our robustness-aware perturbation-based defense method. Experimental results on sentiment analysis and toxic detection tasks show that our method achieves better defending performance and much lower computational costs than existing online defense methods. Our code is available at https://github.com/lancopku/RAP.

翻译：恶意控制了特定触发因素的经过良好训练的后门攻击模型产出的后门攻击最近被证明严重威胁到重用深神经网络的安全性。在这项工作中,我们提出一个基于强力的在线防御机制。具体地说,我们通过分析后门训练过程指出,有毒和清洁样品之间在稳健性方面存在巨大差距。受这一观察的驱动,我们建立了一个基于字的稳健性-觉悟突扰器,将有毒样品与清洁样品区分开来,以抵御自然语言处理模型(NLP)的后门攻击。此外,我们从理论上分析了我们强力的自觉渗透防御方法的可行性。关于情绪分析的实验结果和有毒检测任务显示,我们的方法比现有的在线防御方法更能保护性,计算成本要低得多。我们的代码可以在https://github.com/lancopku/RAP上查阅。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/