Pre-trained models of code have achieved success in many important software engineering tasks. However, these powerful models are vulnerable to adversarial attacks that slightly perturb model inputs to make a victim model produce wrong outputs. Current works mainly attack models of code with examples that preserve operational program semantics but ignore a fundamental requirement for adversarial example generation: perturbations should be natural to human judges, which we refer to as naturalness requirement. In this paper, we propose ALERT (nAturaLnEss AwaRe ATtack), a black-box attack that adversarially transforms inputs to make victim models produce wrong outputs. Different from prior works, this paper considers the natural semantic of generated examples at the same time as preserving the operational semantic of original inputs. Our user study demonstrates that human developers consistently consider that adversarial examples generated by ALERT are more natural than those generated by the state-of-the-art work by Zhang et al. that ignores the naturalness requirement. On attacking CodeBERT, our approach can achieve attack success rates of 53.62%, 27.79%, and 35.78% across three downstream tasks: vulnerability prediction, clone detection and code authorship attribution. On GraphCodeBERT, our approach can achieve average success rates of 76.95%, 7.96% and 61.47% on the three tasks. The above outperforms the baseline by 14.07% and 18.56% on the two pre-trained models on average. Finally, we investigated the value of the generated adversarial examples to harden victim models through an adversarial fine-tuning procedure and demonstrated the accuracy of CodeBERT and GraphCodeBERT against ALERT-generated adversarial examples increased by 87.59% and 92.32%, respectively.
翻译:培训前的代码模型在许多重要的软件工程任务中取得了成功 。 然而, 这些强大的模型很容易受到对抗性攻击, 对抗性攻击略微干扰模型输入模型, 使受害者模型产生错误产出 。 目前的工作主要是攻击代码模型, 其范例保存操作程序语义, 却忽略了对辩论性范例生成的基本要求: 干扰对于人类法官来说应该是自然的, 我们称之为自然要求 。 在本文中, 我们提议 ALERT (nAturarLnEs AwareATtack), 这是一种黑箱攻击, 对抗性地转换输入模型, 使受害者模型产生错误产出。 与先前的工作不同, 本文认为生成的示例自然语义与保留操作程序语义性相同, 但忽略了对辩论性范例生成的基本要求: 人类开发者始终认为, ALERRTERT 生成的对抗性实例比 自然性要求更自然, 我们称之为自然要求 。 在攻击 DCBERT 时, 我们的方法可以达到53. 62% 的罚款% 和35.78% 。