AgentGit：面向可靠且可扩展的LLM驱动多智能体系统的版本控制框架 (AgentGit: A Version Control Framework for Reliable and Scalable LLM-Powered Multi-Agent Systems)

With the rapid progress of large language models (LLMs), LLM-powered multi-agent systems (MAS) are drawing increasing interest across academia and industry. However, many current MAS frameworks struggle with reliability and scalability, especially on complex tasks. We present AgentGit, a framework that brings Git-like rollback and branching to MAS workflows. Built as an infrastructure layer on top of LangGraph, AgentGit supports state commit, revert, and branching, allowing agents to traverse, compare, and explore multiple trajectories efficiently. To evaluate AgentGit, we designed an experiment that optimizes target agents by selecting better prompts. We ran a multi-step A/B test against three baselines -- LangGraph, AutoGen, and Agno -- on a real-world task: retrieving and analyzing paper abstracts. Results show that AgentGit significantly reduces redundant computation, lowers runtime and token usage, and supports parallel exploration across multiple branches, enhancing both reliability and scalability in MAS development. This work offers a practical path to more robust MAS design and enables error recovery, safe exploration, iterative debugging, and A/B testing in collaborative AI systems.

翻译：随着大语言模型（LLMs）的快速发展，基于LLM的多智能体系统（MAS）在学术界和工业界引起了日益广泛的关注。然而，当前许多MAS框架在复杂任务上常面临可靠性与可扩展性不足的挑战。本文提出AgentGit，一个将类Git的回滚与分支机制引入MAS工作流的框架。作为构建于LangGraph之上的基础设施层，AgentGit支持状态提交、回退与分支操作，使智能体能够高效地遍历、比较和探索多路径轨迹。为评估AgentGit，我们设计了一项通过优选提示词来优化目标智能体的实验。在真实任务——检索与分析论文摘要上，我们针对LangGraph、AutoGen和Agno三个基线系统进行了多步骤A/B测试。结果表明，AgentGit显著减少了冗余计算，降低了运行时间与令牌消耗，并支持跨多分支的并行探索，从而提升了MAS开发的可靠性与可扩展性。本工作为构建更鲁棒的MAS提供了可行路径，并在协作式AI系统中实现了错误恢复、安全探索、迭代调试与A/B测试等功能。