开放智能体规范（Agent Spec）：一种用于AI智能体的统一表示方法 (Open Agent Specification (Agent Spec): A Unified Representation for AI Agents)

Soufiane Amini,Yassine Benajiba,Cesare Bernardis,Paul Cayet,Hassan Chafi,Abderrahim Fathan,Louis Faucon,Damien Hilloulin,Sungpack Hong,Ingo Kossyk,Tran Minh Son Le,Rhicheek Patra,Sujith Ravi,Jonas Schweizer,Jyotika Singh,Shailender Singh,Weiyi Sun,Kartik Talamadupula,Jerry Xu

The proliferation of agent frameworks has led to fragmentation in how agents are defined, executed, and evaluated. Existing systems differ in their abstractions, data flow semantics, and tool integrations, making it difficult to share or reproduce workflows. We introduce Open Agent Specification (Agent Spec), a declarative language that defines AI agents and agentic workflows in a way that is compatible across frameworks, promoting reusability, portability and interoperability of AI agents. Agent Spec defines a common set of components, control and data flow semantics, and schemas that allow an agent to be defined once and executed across different runtimes. Agent Spec also introduces a standardized Evaluation harness to assess agent behavior and agentic workflows across runtimes - analogous to how HELM and related harnesses standardized LLM evaluation - so that performance, robustness, and efficiency can be compared consistently across frameworks. We demonstrate this using four distinct runtimes (LangGraph, CrewAI, AutoGen, and WayFlow) evaluated over three different benchmarks (SimpleQA Verified, $\tau^2$-Bench and BIRD-SQL). We provide accompanying toolsets: a Python SDK (PyAgentSpec), a reference runtime (WayFlow), and adapters for popular frameworks (e.g., LangGraph, AutoGen, CrewAI). Agent Spec bridges the gap between model-centric and agent-centric standardization & evaluation, laying the groundwork for reliable, reusable, and portable agentic systems.

翻译：智能体框架的激增导致了智能体定义、执行和评估方式的碎片化。现有系统在抽象层次、数据流语义和工具集成方面存在差异，使得工作流的共享或复现变得困难。我们提出了开放智能体规范（Agent Spec），这是一种声明式语言，用于以跨框架兼容的方式定义AI智能体和智能体工作流，从而促进AI智能体的可重用性、可移植性和互操作性。Agent Spec定义了一组通用组件、控制与数据流语义以及模式，使得智能体可以一次定义并在不同运行时环境中执行。Agent Spec还引入了一个标准化的评估框架，用于评估跨运行时的智能体行为和智能体工作流——类似于HELM及相关框架对LLM评估的标准化——从而可以在不同框架间一致地比较性能、鲁棒性和效率。我们通过四个不同的运行时环境（LangGraph、CrewAI、AutoGen和WayFlow）在三个基准测试（SimpleQA Verified、$\\tau^2$-Bench和BIRD-SQL）上进行了验证。我们提供了配套工具集：一个Python SDK（PyAgentSpec）、一个参考运行时（WayFlow）以及流行框架的适配器（例如LangGraph、AutoGen、CrewAI）。Agent Spec弥合了以模型为中心和以智能体为中心的标准化与评估之间的差距，为可靠、可重用和可移植的智能体系统奠定了基础。