对深源代码处理模型的毒物攻击和防御 (Poison Attack and Defense on Deep Source Code Processing Models)

In the software engineering community, deep learning (DL) has recently been applied to many source code processing tasks. Due to the poor interpretability of DL models, their security vulnerabilities require scrutiny. Recently, researchers have identified an emergent security threat, namely poison attack. The attackers aim to inject insidious backdoors into models by poisoning the training data with poison samples. Poisoned models work normally with clean inputs but produce targeted erroneous results with poisoned inputs embedded with triggers. By activating backdoors, attackers can manipulate the poisoned models in security-related scenarios. To verify the vulnerability of existing deep source code processing models to the poison attack, we present a poison attack framework for source code named CodePoisoner as a strong imaginary enemy. CodePoisoner can produce compilable even human-imperceptible poison samples and attack models by poisoning the training data with poison samples. To defend against the poison attack, we further propose an effective defense approach named CodeDetector to detect poison samples in the training data. CodeDetector can be applied to many model architectures and effectively defend against multiple poison attack approaches. We apply our CodePoisoner and CodeDetector to three tasks, including defect detection, clone detection, and code repair. The results show that (1) CodePoisoner achieves a high attack success rate (max: 100%) in misleading models to targeted erroneous behaviors. It validates that existing deep source code processing models have a strong vulnerability to the poison attack. (2) CodeDetector effectively defends against multiple poison attack approaches by detecting (max: 100%) poison samples in the training data. We hope this work can help practitioners notice the poison attack and inspire the design of more advanced defense techniques.

翻译：在软件工程界,最近对许多源代码处理任务应用了深层次学习(DL) 。由于 DL 模型的可解释性差, 其安全脆弱性需要检查。最近, 研究人员发现了一种突发安全威胁, 即毒物袭击。攻击者的目的是用毒物样本将隐蔽的后门注入模型中。中毒模型通常使用清洁投入, 但却通过嵌入触发器的有毒投入产生有针对性的错误结果。通过启动后门, 攻击者可以在安全相关情况下操纵有毒模型。为了验证现有源代码处理模型对毒物袭击的脆弱程度, 我们为名为Code Poisoner的源代码提出了一个毒物袭击框架, 名为Codepoisoner, 是一个强大的想象中的敌人。 codepolisoner能够通过用毒物样本毒物袭击模型来将隐藏在模型中。为了防范毒物袭击数据袭击, 我们用代码代码和代码可以有效地用于三个任务, 包括: 测得的DNA检测方法; 测算方法; 测算方法: 测算方法; 测算方法; 测算方法; 测算方法; 测算方法; 测算方法; 测算方法; 测算方法; 测算方法; 测算方法; 测算方法; 测算方法; 测算方法; 测算方法; 测算方法; 测算方法; 测算方法; 测法; 测算方法; 测方法; 测算方法; 测算方法; 测算方法; 测算方法; 测算方法; 测算方法; 测算方法; 测算方法; 测算方法; 测算方法; 测算方法; 测算方法; 测算方法; 测算方法; 测算方法; 测算方法; 测算方法; 测; 测算; 测; 测; 测; 测; 测; 测算; 测算; 测算; 测算; 测算; 测算; 测算; 测算; 测算; 测算; 测算; 测算; 测算; 测算; 测算; 测算; 测算; 测算; 测算;

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/