We are currently witnessing dramatic advances in the capabilities of Large Language Models (LLMs). They are already being adopted in practice and integrated into many systems, including integrated development environments (IDEs) and search engines. The functionalities of current LLMs can be modulated via natural language prompts, while their exact internal functionality remains implicit and unassessable. This property, which makes them adaptable to even unseen tasks, might also make them susceptible to targeted adversarial prompting. Recently, several ways to misalign LLMs using Prompt Injection (PI) attacks have been introduced. In such attacks, an adversary can prompt the LLM to produce malicious content or override the original instructions and the employed filtering schemes. Recent work showed that these attacks are hard to mitigate, as state-of-the-art LLMs are instruction-following. So far, these attacks assumed that the adversary is directly prompting the LLM. In this work, we show that augmenting LLMs with retrieval and API calling capabilities (so-called Application-Integrated LLMs) induces a whole new set of attack vectors. These LLMs might process poisoned content retrieved from the Web that contains malicious prompts pre-injected and selected by adversaries. We demonstrate that an attacker can indirectly perform such PI attacks. Based on this key insight, we systematically analyze the resulting threat landscape of Application-Integrated LLMs and discuss a variety of new attack vectors. To demonstrate the practical viability of our attacks, we implemented specific demonstrations of the proposed attacks within synthetic applications. In summary, our work calls for an urgent evaluation of current mitigation techniques and an investigation of whether new techniques are needed to defend LLMs against these threats.
翻译:目前,大语言模型(LLMS)的能力正在取得巨大进步,它们已经在实践中被采用,并被纳入许多系统,包括综合开发环境(IDES)和搜索引擎。当前LLMS的功能可以通过自然语言提示进行调节,而其精确的内部功能仍然是隐含和不可评估的。这些属性使得它们适应甚至看不见的任务,还可能使其容易受到有针对性的对抗性刺激。最近,引入了使用迅速注射(PI)攻击来误导LMS的几种方法。在这类攻击中,敌国可以促使LLM产生恶意内容或超越原始指令和使用的过滤机制。最近的工作表明,这些攻击是很难减轻的,因为其准确的内部功能仍然是其内部功能。这些攻击假设,使敌人直接触发LMMM。在这项工作中,我们用检索和API呼叫能力(所谓的“应用-一体化LMS”)来强化新的攻击矢量。这些LMMS可能会在网络上对攻击进行毒害性分析,我们从攻击中提取的系统化攻击技术,我们从网络上对它进行精确的直观分析。我们进行这种攻击之前,我们所选择的磁性攻击的直观分析。