Large Language Models (LLMs) perform well in language tasks but often lack collaborative awareness and struggle to optimize global performance in multi-agent settings. We present a reinforcement learning-augmented LLM agent framework that formulates cooperation as a decentralized partially observable Markov decision process (Dec-POMDP) and adopts centralized training with decentralized execution (CTDE). We introduce Group Relative Policy Optimization (GRPO) to jointly optimize agent policies with access to global signals during training, together with a simplified joint reward that balances task quality, speed, and coordination cost. On collaborative writing and coding benchmarks, our framework delivers a 3x increase in task processing speed over single-agent baselines, 98.7% structural/style consistency in writing, and a 74.6% test pass rate in coding. The approach consistently outperforms strong multi-agent LLM baselines and provides a practical path toward reliable collaboration in complex workflows.
翻译:大型语言模型(LLM)在语言任务中表现出色,但通常缺乏协同意识,且在多智能体环境中难以优化全局性能。本文提出一种强化学习增强型LLM智能体框架,将协作建模为去中心化部分可观测马尔可夫决策过程(Dec-POMDP),并采用集中训练与分散执行(CTDE)机制。我们引入群体相对策略优化(GRPO)方法,通过在训练期间利用全局信号联合优化智能体策略,同时设计简化的联合奖励函数以平衡任务质量、处理速度与协调成本。在协同写作与编程基准测试中,本框架相比单智能体基线实现了任务处理速度3倍提升,写作任务中达到98.7%的结构/风格一致性,编程任务中获得74.6%的测试通过率。该方法持续优于现有强大多智能体LLM基线,为复杂工作流中的可靠协作提供了实用路径。