Synthetic data refers to artificial samples generated by models. While it has been validated to significantly enhance the performance of large language models (LLMs) during training and has been widely adopted in LLM development, potential security risks it may introduce remain uninvestigated. This paper systematically evaluates the resilience of synthetic-data-integrated training paradigm for LLMs against mainstream poisoning and backdoor attacks. We reveal that such a paradigm exhibits strong resistance to existing attacks, primarily thanks to the different distribution patterns between poisoning data and queries used to generate synthetic samples. To enhance the effectiveness of these attacks and further investigate the security risks introduced by synthetic data, we introduce a novel and universal attack framework, namely, Virus Infection Attack (VIA), which enables the propagation of current attacks through synthetic data even under purely clean queries. Inspired by the principles of virus design in cybersecurity, VIA conceals the poisoning payload within a protective "shell" and strategically searches for optimal hijacking points in benign samples to maximize the likelihood of generating malicious content. Extensive experiments on both data poisoning and backdoor attacks show that VIA significantly increases the presence of poisoning content in synthetic data and correspondingly raises the attack success rate (ASR) on downstream models to levels comparable to those observed in the poisoned upstream models.
翻译:合成数据指由模型生成的人工样本。尽管已证实其在训练过程中能显著提升大型语言模型(LLMs)的性能,并已被广泛应用于LLM开发,但其可能引入的安全风险尚未得到充分研究。本文系统评估了集成合成数据的LLM训练范式对主流投毒攻击与后门攻击的防御能力。我们发现该范式对现有攻击表现出较强的抵抗力,这主要归因于投毒数据与生成合成样本所用查询之间的分布模式差异。为提升这些攻击的有效性并进一步探究合成数据引入的安全风险,我们提出了一种新颖且通用的攻击框架——病毒式感染攻击(VIA),该框架能使现有攻击在完全干净的查询条件下通过合成数据进行传播。受网络安全领域病毒设计原理的启发,VIA将投毒载荷隐藏于保护性“外壳”中,并策略性地在良性样本中搜索最优劫持点,以最大化生成恶意内容的可能性。在数据投毒与后门攻击上的大量实验表明,VIA显著增加了合成数据中投毒内容的出现频率,并相应地将下游模型的攻击成功率(ASR)提升至与受污染上游模型相当的水平。