GraphTracer：面向鲁棒多轮深度搜索的LLM智能体图引导故障溯源 (GraphTracer: Graph-Guided Failure Tracing in LLM Agents for Robust Multi-Turn Deep Search)

from arxiv, This submission has been withdrawn by the authors due to a fundamental error in the methodology that affects the validity of the main results

Multi-agent systems powered by Large Language Models excel at complex tasks through coordinated collaboration, yet they face high failure rates in multi-turn deep search scenarios. Existing temporal attribution methods struggle to accurately diagnose root causes, particularly when errors propagate across multiple agents. Attempts to automate failure attribution by analyzing action sequences remain ineffective due to their inability to account for information dependencies that span agents. This paper identifies two core challenges: \textit{(i) distinguishing symptoms from root causes in multi-agent error propagation}, and \textit{(ii) tracing information dependencies beyond temporal order}. To address these issues, we introduce \textbf{GraphTracer}, a framework that redefines failure attribution through information flow analysis. GraphTracer constructs Information Dependency Graphs (IDGs) to explicitly capture how agents reference and build on prior outputs. It localizes root causes by tracing through these dependency structures instead of relying on temporal sequences. GraphTracer also uses graph-aware synthetic data generation to target critical nodes, creating realistic failure scenarios. Evaluations on the Who\&When benchmark and integration into production systems demonstrate that GraphTracer-8B achieves up to 18.18\% higher attribution accuracy compared to state-of-the-art models and enables 4.8\% to 14.2\% performance improvements in deployed multi-agent frameworks, establishing a robust solution for multi-agent system debugging.

翻译：基于大型语言模型的多智能体系统通过协同协作在复杂任务中表现出色，但在多轮深度搜索场景中仍面临较高的失败率。现有的时序归因方法难以准确诊断根本原因，尤其是在错误跨多个智能体传播的情况下。通过分析动作序列实现故障归因自动化的尝试仍然效果有限，因其无法捕捉跨智能体的信息依赖关系。本文指出两个核心挑战：\textit{(i) 在多智能体错误传播中区分症状与根本原因}，以及\textit{(ii) 超越时序顺序的信息依赖追踪}。为解决这些问题，我们提出\textbf{GraphTracer}框架，通过信息流分析重新定义故障归因机制。GraphTracer构建信息依赖图（IDG）以显式捕捉智能体如何引用和基于先前输出进行构建，通过追踪这些依赖结构而非依赖时序序列来定位根本原因。该框架还采用图感知的合成数据生成技术，针对关键节点创建真实故障场景。在Who\&When基准测试及生产系统集成中的评估表明，GraphTracer-8B相比最先进模型将归因准确率最高提升18.18\%，并在已部署的多智能体框架中实现4.8\%至14.2\%的性能提升，为多智能体系统调试提供了鲁棒的解决方案。