Autonomous driving systems increasingly rely on multi-agent architectures powered by large language models (LLMs), where specialized agents collaborate to perceive, reason, and plan. A key component of these systems is the shared function library, a collection of software tools that agents use to process sensor data and navigate complex driving environments. Despite its critical role in agent decision-making, the function library remains an under-explored vulnerability. In this paper, we introduce FuncPoison, a novel poisoning-based attack targeting the function library to manipulate the behavior of LLM-driven multi-agent autonomous systems. FuncPoison exploits two key weaknesses in how agents access the function library: (1) agents rely on text-based instructions to select tools; and (2) these tools are activated using standardized command formats that attackers can replicate. By injecting malicious tools with deceptive instructions, FuncPoison manipulates one agent s decisions--such as misinterpreting road conditions--triggering cascading errors that mislead other agents in the system. We experimentally evaluate FuncPoison on two representative multi-agent autonomous driving systems, demonstrating its ability to significantly degrade trajectory accuracy, flexibly target specific agents to induce coordinated misbehavior, and evade diverse defense mechanisms. Our results reveal that the function library, often considered a simple toolset, can serve as a critical attack surface in LLM-based autonomous driving systems, raising elevated concerns on their reliability.
翻译:自动驾驶系统日益依赖于由大语言模型(LLM)驱动的多智能体架构,其中专用智能体通过协作实现感知、推理与规划。此类系统的核心组件之一是共享函数库——一个供智能体处理传感器数据与导航复杂驾驶环境的软件工具集合。尽管该函数库在智能体决策中扮演关键角色,其作为安全漏洞的潜在风险尚未得到充分探究。本文提出FuncPoison,一种针对函数库的新型毒化攻击方法,旨在操纵基于LLM的多智能体自动驾驶系统的行为。FuncPoison利用了智能体访问函数库时的两个关键弱点:(1) 智能体依赖基于文本的指令选择工具;(2) 这些工具通过标准化的命令格式激活,而攻击者可复制此类格式。通过注入带有欺骗性指令的恶意工具,FuncPoison能够操纵单个智能体的决策(例如对路况的误判),进而引发级联错误,误导系统中的其他智能体。我们在两个典型的多智能体自动驾驶系统上对FuncPoison进行了实验评估,结果表明该攻击能显著降低轨迹精度,灵活针对特定智能体诱发协同异常行为,并能规避多种防御机制。本研究揭示:通常被视为简单工具集的函数库,可能成为基于LLM的自动驾驶系统中的关键攻击面,这对其可靠性提出了更高层面的警示。