In the software engineering community, deep learning (DL) has recently been applied to many source code processing tasks. Due to the poor interpretability of DL models, their security vulnerabilities require scrutiny. Recently, researchers have identified an emergent security threat, namely poison attack. The attackers aim to inject insidious backdoors into models by poisoning the training data with poison samples. Poisoned models work normally with clean inputs but produce targeted erroneous results with poisoned inputs embedded with triggers. By activating backdoors, attackers can manipulate the poisoned models in security-related scenarios. To verify the vulnerability of existing deep source code processing models to the poison attack, we present a poison attack framework for source code named CodePoisoner as a strong imaginary enemy. CodePoisoner can produce compilable even human-imperceptible poison samples and attack models by poisoning the training data with poison samples. To defend against the poison attack, we further propose an effective defense approach named CodeDetector to detect poison samples in the training data. CodeDetector can be applied to many model architectures and effectively defend against multiple poison attack approaches. We apply our CodePoisoner and CodeDetector to three tasks, including defect detection, clone detection, and code repair. The results show that (1) CodePoisoner achieves a high attack success rate (max: 100%) in misleading models to targeted erroneous behaviors. It validates that existing deep source code processing models have a strong vulnerability to the poison attack. (2) CodeDetector effectively defends against multiple poison attack approaches by detecting (max: 100%) poison samples in the training data. We hope this work can help practitioners notice the poison attack and inspire the design of more advanced defense techniques.
翻译:在软件工程界,最近对许多源代码处理任务应用了深层次学习(DL) 。 由于 DL 模型的可解释性差, 其安全脆弱性需要检查。 最近, 研究人员发现了一种突发安全威胁, 即毒物袭击。 攻击者的目的是用毒物样本将隐蔽的后门注入模型中。 中毒模型通常使用清洁投入, 但却通过嵌入触发器的有毒投入产生有针对性的错误结果。 通过启动后门, 攻击者可以在安全相关情况下操纵有毒模型。 为了验证现有源代码处理模型对毒物袭击的脆弱程度, 我们为名为Code Poisoner的源代码提出了一个毒物袭击框架, 名为Codepoisoner, 是一个强大的想象中的敌人。 codepolisoner能够通过用毒物样本毒物袭击模型来将隐藏在模型中。 为了防范毒物袭击数据袭击, 我们用代码代码和代码可以有效地用于三个任务, 包括: 测得的DNA检测方法; 测算方法; 测算方法: 测算方法; 测算方法; 测算方法; 测算方法; 测算方法; 测算方法; 测算方法; 测算方法; 测算方法; 测算方法; 测算方法; 测算方法; 测算方法; 测算方法; 测算方法; 测算方法; 测算方法; 测法; 测算方法; 测方法; 测算方法; 测算方法; 测算方法; 测算方法; 测算方法; 测算方法; 测算方法; 测算方法; 测算方法; 测算方法; 测算方法; 测算方法; 测算方法; 测算方法; 测算方法; 测算方法; 测算方法; 测; 测算; 测; 测; 测; 测; 测; 测算; 测算; 测算; 测算; 测算; 测算; 测算; 测算; 测算; 测算; 测算; 测算; 测算; 测算; 测算; 测算; 测算; 测算; 测算;