An Agentic AI Workflow (AAW), also known as an LLM-based multi-agent system, is an autonomous system that assembles several LLM-based agents to work collaboratively towards a shared goal. The high autonomy, widespread adoption, and growing interest in such AAWs highlight the need for a deeper understanding of their operations, from both quality and security aspects. To this day, there are no existing methods to assess the influence of each agent on the AAW's final output. Adopting techniques from related fields is not feasible since existing methods perform only static structural analysis, which is unsuitable for inference time execution. We present Counterfactual-based Agent Influence Ranker (CAIR) - the first method for assessing the influence level of each agent on the AAW's output and determining which agents are the most influential. By performing counterfactual analysis, CAIR provides a task-agnostic analysis that can be used both offline and at inference time. We evaluate CAIR using an AAWs dataset of our creation, containing 30 different use cases with 230 different functionalities. Our evaluation showed that CAIR produces consistent rankings, outperforms baseline methods, and can easily enhance the effectiveness and relevancy of downstream tasks.
翻译:自主人工智能工作流(AAW),亦称为基于大语言模型的多智能体系统,是一种通过组合多个基于大语言模型的智能体以协作实现共同目标的自主系统。此类AAW的高度自主性、广泛应用及日益增长的研究关注度,凸显了从质量和安全角度深入理解其运行机制的必要性。迄今为止,尚无现有方法能够评估各智能体对AAW最终输出的影响力。借鉴相关领域技术并不可行,因为现有方法仅能进行静态结构分析,无法适用于推理时执行。本文提出基于反事实的智能体影响力排序器(CAIR)——首个用于评估各智能体对AAW输出的影响力水平、并确定最具影响力智能体的方法。通过执行反事实分析,CAIR提供了一种与任务无关的分析框架,可同时支持离线与推理时应用。我们使用自行构建的AAW数据集对CAIR进行评估,该数据集涵盖30种不同用例及230项功能。实验表明,CAIR能生成一致的排序结果,其性能优于基线方法,并可有效提升下游任务的效能与相关性。