Existing data poisoning attacks on retrieval-augmented generation (RAG) systems scale poorly because they require costly optimization of poisoned documents for each target phrase. We introduce Eyes-on-Me, a modular attack that decomposes an adversarial document into reusable Attention Attractors and Focus Regions. Attractors are optimized to direct attention to the Focus Region. Attackers can then insert semantic baits for the retriever or malicious instructions for the generator, adapting to new targets at near zero cost. This is achieved by steering a small subset of attention heads that we empirically identify as strongly correlated with attack success. Across 18 end-to-end RAG settings (3 datasets $\times$ 2 retrievers $\times$ 3 generators), Eyes-on-Me raises average attack success rates from 21.9 to 57.8 (+35.9 points, 2.6$\times$ over prior work). A single optimized attractor transfers to unseen black box retrievers and generators without retraining. Our findings establish a scalable paradigm for RAG data poisoning and show that modular, reusable components pose a practical threat to modern AI systems. They also reveal a strong link between attention concentration and model outputs, informing interpretability research.
翻译:现有针对检索增强生成(RAG)系统的数据投毒攻击扩展性较差,因为需要为每个目标短语对投毒文档进行代价高昂的优化。我们提出了Eyes-on-Me,一种模块化攻击方法,它将对抗性文档分解为可复用的注意力吸引子和焦点区域。吸引子经过优化,可将注意力引导至焦点区域。攻击者随后可插入用于检索器的语义诱饵或用于生成器的恶意指令,以近乎零成本适应新目标。这是通过引导一小部分注意力头实现的,我们通过实证发现这些注意力头与攻击成功率高度相关。在18种端到端RAG设置中(3个数据集 $\times$ 2种检索器 $\times$ 3种生成器),Eyes-on-Me将平均攻击成功率从21.9提升至57.8(提升35.9个百分点,是先前工作的2.6$\times$)。单个优化后的吸引子无需重新训练即可迁移到未见过的黑盒检索器和生成器。我们的研究结果建立了一种可扩展的RAG数据投毒范式,并表明模块化、可复用的组件对现代AI系统构成了实际威胁。同时,它们揭示了注意力集中度与模型输出之间的强关联,为可解释性研究提供了信息。