病毒式感染攻击对LLMs的影响：您的投毒可通过合成数据“传播” (Virus Infection Attack on LLMs: Your Poisoning Can Spread "VIA" Synthetic Data)

Synthetic data refers to artificial samples generated by models. While it has been validated to significantly enhance the performance of large language models (LLMs) during training and has been widely adopted in LLM development, potential security risks it may introduce remain uninvestigated. This paper systematically evaluates the resilience of synthetic-data-integrated training paradigm for LLMs against mainstream poisoning and backdoor attacks. We reveal that such a paradigm exhibits strong resistance to existing attacks, primarily thanks to the different distribution patterns between poisoning data and queries used to generate synthetic samples. To enhance the effectiveness of these attacks and further investigate the security risks introduced by synthetic data, we introduce a novel and universal attack framework, namely, Virus Infection Attack (VIA), which enables the propagation of current attacks through synthetic data even under purely clean queries. Inspired by the principles of virus design in cybersecurity, VIA conceals the poisoning payload within a protective "shell" and strategically searches for optimal hijacking points in benign samples to maximize the likelihood of generating malicious content. Extensive experiments on both data poisoning and backdoor attacks show that VIA significantly increases the presence of poisoning content in synthetic data and correspondingly raises the attack success rate (ASR) on downstream models to levels comparable to those observed in the poisoned upstream models.

翻译：合成数据指由模型生成的人工样本。尽管已证实其在训练过程中能显著提升大型语言模型（LLMs）的性能，并已被广泛应用于LLM开发，但其可能引入的安全风险尚未得到充分研究。本文系统评估了集成合成数据的LLM训练范式对主流投毒攻击与后门攻击的防御能力。我们发现该范式对现有攻击表现出较强的抵抗力，这主要归因于投毒数据与生成合成样本所用查询之间的分布模式差异。为提升这些攻击的有效性并进一步探究合成数据引入的安全风险，我们提出了一种新颖且通用的攻击框架——病毒式感染攻击（VIA），该框架能使现有攻击在完全干净的查询条件下通过合成数据进行传播。受网络安全领域病毒设计原理的启发，VIA将投毒载荷隐藏于保护性“外壳”中，并策略性地在良性样本中搜索最优劫持点，以最大化生成恶意内容的可能性。在数据投毒与后门攻击上的大量实验表明，VIA显著增加了合成数据中投毒内容的出现频率，并相应地将下游模型的攻击成功率（ASR）提升至与受污染上游模型相当的水平。

相关内容

MoDELS

关注 44

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日