Recent progress in large language code models (LLCMs) has led to a dramatic surge in the use of software development. Nevertheless, it is widely known that training a well-performed LLCM requires a plethora of workforce for collecting the data and high quality annotation. Additionally, the training dataset may be proprietary (or partially open source to the public), and the training process is often conducted on a large-scale cluster of GPUs with high costs. Inspired by the recent success of imitation attacks in extracting computer vision and natural language models, this work launches the first imitation attack on LLCMs: by querying a target LLCM with carefully-designed queries and collecting the outputs, the adversary can train an imitation model that manifests close behavior with the target LLCM. We systematically investigate the effectiveness of launching imitation attacks under different query schemes and different LLCM tasks. We also design novel methods to polish the LLCM outputs, resulting in an effective imitation training process. We summarize our findings and provide lessons harvested in this study that can help better depict the attack surface of LLCMs. Our research contributes to the growing body of knowledge on imitation attacks and defenses in deep neural models, particularly in the domain of code related tasks.
翻译:最近大型语言代码模型(LLCMs)的进展在软件开发领域引起了巨大的反响。然而,众所周知的是,训练一个性能良好的LLCM需要大量的工作来收集数据和高质量的注释。此外,训练数据集可能是专有的(或部分开放给公众),训练过程通常是在大规模的GPU集群上进行的,具有高成本。受模仿攻击提取计算机视觉和自然语言模型的最近成功启发,本文首次在LLCM上进行模仿攻击:通过用精心设计的查询查询目标LLCM并收集输出,对手可以训练一个表现出与目标LLCM接近行为的模仿模型。我们系统地研究了在不同查询方案和不同LLCM任务下启动模仿攻击的效果。我们还设计了新的方法来优化LLCM输出,从而实现有效的模仿训练过程。我们总结了我们的发现,并提供了在这项研究中收获的经验教训,可以帮助更好地描绘LLCM的攻击面。我们的研究贡献了在深度神经模型中模仿攻击和防御领域,特别是在代码相关任务中的知识体系。