隐性泄露：通过良性查询对RAG系统实施的隐式知识提取攻击 (Silent Leaks: Implicit Knowledge Extraction Attack on RAG Systems through Benign Queries)

Retrieval-Augmented Generation (RAG) systems enhance large language models (LLMs) by incorporating external knowledge bases, but this may expose them to extraction attacks, leading to potential copyright and privacy risks. However, existing extraction methods typically rely on malicious inputs such as prompt injection or jailbreaking, making them easily detectable via input- or output-level detection. In this paper, we introduce Implicit Knowledge Extraction Attack (IKEA), which conducts Knowledge Extraction on RAG systems through benign queries. Specifically, IKEA first leverages anchor concepts-keywords related to internal knowledge-to generate queries with a natural appearance, and then designs two mechanisms that lead anchor concepts to thoroughly "explore" the RAG's knowledge: (1) Experience Reflection Sampling, which samples anchor concepts based on past query-response histories, ensuring their relevance to the topic; (2) Trust Region Directed Mutation, which iteratively mutates anchor concepts under similarity constraints to further exploit the embedding space. Extensive experiments demonstrate IKEA's effectiveness under various defenses, surpassing baselines by over 80% in extraction efficiency and 90% in attack success rate. Moreover, the substitute RAG system built from IKEA's extractions shows comparable performance to the original RAG and outperforms those based on baselines across multiple evaluation tasks, underscoring the stealthy copyright infringement risk in RAG systems.

翻译：检索增强生成（RAG）系统通过整合外部知识库来增强大语言模型（LLMs）的能力，但这可能使其面临知识提取攻击，导致潜在的版权与隐私风险。然而，现有的提取方法通常依赖于恶意输入（如提示注入或越狱攻击），使其易于通过输入或输出层面的检测机制被发现。本文提出隐式知识提取攻击（IKEA），该方法通过良性查询对RAG系统实施知识提取。具体而言，IKEA首先利用锚点概念（即与内部知识相关的关键词）生成外观自然的查询，随后设计两种机制使锚点概念能够全面“探索”RAG系统的知识：（1）经验反射采样：基于历史查询-响应记录对锚点概念进行采样，确保其与主题的相关性；（2）信任区域定向突变：在相似性约束下迭代突变锚点概念，以进一步挖掘嵌入空间。大量实验表明，IKEA在多种防御机制下均表现出色，其提取效率较基线方法提升超过80%，攻击成功率提升超过90%。此外，基于IKEA提取结果构建的替代RAG系统在多项评估任务中表现出与原系统相当的性能，且显著优于基于基线方法构建的系统，这揭示了RAG系统中存在的隐蔽版权侵权风险。