强化学习增强型LLM智能体用于协同决策与性能优化 (Reinforcement Learning-Augmented LLM Agents for Collaborative Decision Making and Performance Optimization) - 专知论文

会员服务 ·

0

协同 · 增强型 · 智能体 · 协作 · 编程 ·

2025 年 12 月 31 日

Reinforcement Learning-Augmented LLM Agents for Collaborative Decision Making and Performance Optimization

翻译：强化学习增强型LLM智能体用于协同决策与性能优化

Dong Qiu,Duo Xu,Limengxi Yue

from arxiv, Accepted by IEEE ICFTIC 2025

Large Language Models (LLMs) perform well in language tasks but often lack collaborative awareness and struggle to optimize global performance in multi-agent settings. We present a reinforcement learning-augmented LLM agent framework that formulates cooperation as a decentralized partially observable Markov decision process (Dec-POMDP) and adopts centralized training with decentralized execution (CTDE). We introduce Group Relative Policy Optimization (GRPO) to jointly optimize agent policies with access to global signals during training, together with a simplified joint reward that balances task quality, speed, and coordination cost. On collaborative writing and coding benchmarks, our framework delivers a 3x increase in task processing speed over single-agent baselines, 98.7% structural/style consistency in writing, and a 74.6% test pass rate in coding. The approach consistently outperforms strong multi-agent LLM baselines and provides a practical path toward reliable collaboration in complex workflows.

翻译：大型语言模型（LLM）在语言任务中表现出色，但通常缺乏协同意识，且在多智能体环境中难以优化全局性能。本文提出一种强化学习增强型LLM智能体框架，将协作建模为去中心化部分可观测马尔可夫决策过程（Dec-POMDP），并采用集中训练与分散执行（CTDE）机制。我们引入群体相对策略优化（GRPO）方法，通过在训练期间利用全局信号联合优化智能体策略，同时设计简化的联合奖励函数以平衡任务质量、处理速度与协调成本。在协同写作与编程基准测试中，本框架相比单智能体基线实现了任务处理速度3倍提升，写作任务中达到98.7%的结构/风格一致性，编程任务中获得74.6%的测试通过率。该方法持续优于现有强大多智能体LLM基线，为复杂工作流中的可靠协作提供了实用路径。

0

相关内容

从语言模型到语言智能体，普林斯顿Shunyu Yao

从语言模型到语言智能体，普林斯顿Shunyu Yao

专知会员服务

63+阅读 · 2023年9月18日

51页《基于Transformer的多模态与自监督学习》最新报告，Google Xiaohua Zhai

51页《基于Transformer的多模态与自监督学习》最新报告，Google Xiaohua Zhai

专知会员服务

68+阅读 · 2023年2月24日

【超越消息传递:图神经网络的物理启发范式】Beyond Message Passing: a Physics-Inspired Paradigm for Graph Neural Networks

【超越消息传递:图神经网络的物理启发范式】Beyond Message Passing: a Physics-Inspired Paradigm for Graph Neural Networks

专知会员服务

17+阅读 · 2022年5月10日

【CVPR 2022】长尾视觉数据识别的嵌套式协同学习方法 Nested Collaborative Learning for Long-Tailed Visual Recognition

【CVPR 2022】长尾视觉数据识别的嵌套式协同学习方法 Nested Collaborative Learning for Long-Tailed Visual Recognition

专知会员服务

13+阅读 · 2022年3月19日

【ACL2020-CMU-Google】MobileBERT:用于资源受限设备的任务无关“瘦版”BERT

【ACL2020-CMU-Google】MobileBERT:用于资源受限设备的任务无关“瘦版”BERT

专知会员服务

13+阅读 · 2020年4月9日

【CVPR 2020 Oral】小样本类增量学习

【CVPR 2020 Oral】小样本类增量学习

专知

20+阅读 · 2020年6月26日

图机器学习 2.2-2.4 Properties of Networks, Random Graph

图机器学习 2.2-2.4 Properties of Networks, Random Graph

图与推荐

10+阅读 · 2020年3月28日

论文笔记之Feature Selective Networks for Object Detection

论文笔记之Feature Selective Networks for Object Detection

统计学习与视觉计算组

21+阅读 · 2018年7月26日

CosFace: Large Margin Cosine Loss for Deep Face Recognition论文笔记

CosFace: Large Margin Cosine Loss for Deep Face Recognition论文笔记

统计学习与视觉计算组

44+阅读 · 2018年4月25日

论文浅尝 | Know-Evolve: Deep Temporal Reasoning for Dynamic KG

论文浅尝 | Know-Evolve: Deep Temporal Reasoning for Dynamic KG

开放知识图谱

36+阅读 · 2018年3月30日

语义Web知识库补全关键技术研究

国家自然科学基金

17+阅读 · 2017年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

46+阅读 · 2015年12月31日

基于上下文感知和异质特征集成的SAR图像分割与评价

国家自然科学基金

2+阅读 · 2015年12月31日

基于非对称群体兴趣相关性并融合情境与群体信任的Web服务推荐研究

国家自然科学基金

1+阅读 · 2015年12月31日

基于动态分层与自学习的多智能体自适应协作模型

国家自然科学基金

17+阅读 · 2008年12月31日

FPGA Co-Design for Efficient N:M Sparse and Quantized Model Inference

Arxiv

0+阅读 · 2025年12月31日

ReSemAct: Advancing Fine-Grained Robotic Manipulation via Semantic Structuring and Affordance Refinement

Arxiv

0+阅读 · 2025年12月29日

It's a TRAP! Task-Redirecting Agent Persuasion Benchmark for Web Agents

Arxiv

0+阅读 · 2025年12月29日

ParaMaP: Parallel Mapping and Collision-free Motion Planning for Reactive Robot Manipulation

Arxiv

0+阅读 · 2025年12月27日

Generative Adversarial Reasoner: Enhancing LLM Reasoning with Adversarial Reinforcement Learning

Arxiv

0+阅读 · 2025年12月26日

VIP会员

文章信息

相关主题

相关VIP内容

从语言模型到语言智能体，普林斯顿Shunyu Yao

从语言模型到语言智能体，普林斯顿Shunyu Yao

专知会员服务

63+阅读 · 2023年9月18日

51页《基于Transformer的多模态与自监督学习》最新报告，Google Xiaohua Zhai

51页《基于Transformer的多模态与自监督学习》最新报告，Google Xiaohua Zhai

专知会员服务

68+阅读 · 2023年2月24日

【超越消息传递:图神经网络的物理启发范式】Beyond Message Passing: a Physics-Inspired Paradigm for Graph Neural Networks

【超越消息传递:图神经网络的物理启发范式】Beyond Message Passing: a Physics-Inspired Paradigm for Graph Neural Networks

专知会员服务

17+阅读 · 2022年5月10日

【CVPR 2022】长尾视觉数据识别的嵌套式协同学习方法 Nested Collaborative Learning for Long-Tailed Visual Recognition

【CVPR 2022】长尾视觉数据识别的嵌套式协同学习方法 Nested Collaborative Learning for Long-Tailed Visual Recognition

专知会员服务

13+阅读 · 2022年3月19日

【ACL2020-CMU-Google】MobileBERT:用于资源受限设备的任务无关“瘦版”BERT

【ACL2020-CMU-Google】MobileBERT:用于资源受限设备的任务无关“瘦版”BERT

专知会员服务

13+阅读 · 2020年4月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《运用增强现实技术进行军事任务规划》130页

《高压决策环境中的人机协作》200页博士论文

《2025财年美陆军转型倡议（ATI）部队结构与组织提案》

《探索用于低层级任务区分与分类的转址旁路缓冲》

相关资讯

【CVPR 2020 Oral】小样本类增量学习

【CVPR 2020 Oral】小样本类增量学习

专知

20+阅读 · 2020年6月26日

图机器学习 2.2-2.4 Properties of Networks, Random Graph

图机器学习 2.2-2.4 Properties of Networks, Random Graph

图与推荐

10+阅读 · 2020年3月28日

论文笔记之Feature Selective Networks for Object Detection

论文笔记之Feature Selective Networks for Object Detection

统计学习与视觉计算组

21+阅读 · 2018年7月26日

CosFace: Large Margin Cosine Loss for Deep Face Recognition论文笔记

CosFace: Large Margin Cosine Loss for Deep Face Recognition论文笔记

统计学习与视觉计算组

44+阅读 · 2018年4月25日

论文浅尝 | Know-Evolve: Deep Temporal Reasoning for Dynamic KG

论文浅尝 | Know-Evolve: Deep Temporal Reasoning for Dynamic KG

开放知识图谱

36+阅读 · 2018年3月30日

相关论文

FPGA Co-Design for Efficient N:M Sparse and Quantized Model Inference

Arxiv

0+阅读 · 2025年12月31日

ReSemAct: Advancing Fine-Grained Robotic Manipulation via Semantic Structuring and Affordance Refinement

Arxiv

0+阅读 · 2025年12月29日

It's a TRAP! Task-Redirecting Agent Persuasion Benchmark for Web Agents

Arxiv

0+阅读 · 2025年12月29日

ParaMaP: Parallel Mapping and Collision-free Motion Planning for Reactive Robot Manipulation

Arxiv

0+阅读 · 2025年12月27日

Generative Adversarial Reasoner: Enhancing LLM Reasoning with Adversarial Reinforcement Learning

Arxiv

0+阅读 · 2025年12月26日

相关基金

语义Web知识库补全关键技术研究

国家自然科学基金

17+阅读 · 2017年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

46+阅读 · 2015年12月31日

基于上下文感知和异质特征集成的SAR图像分割与评价

国家自然科学基金

2+阅读 · 2015年12月31日

基于非对称群体兴趣相关性并融合情境与群体信任的Web服务推荐研究

国家自然科学基金

1+阅读 · 2015年12月31日

基于动态分层与自学习的多智能体自适应协作模型

国家自然科学基金

17+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员