价值函数分解：在去中心化多智能体策略梯度中共享潜在状态信息 (Value Functions Factorization with Latent State Information Sharing in Decentralized Multi-Agent Policy Gradients) - 专知论文

会员服务 ·

0

值函数 · 价值函数 · 分解 · 智能体 · SAC ·

2023 年 4 月 18 日

Value Functions Factorization with Latent State Information Sharing in Decentralized Multi-Agent Policy Gradients

翻译：价值函数分解：在去中心化多智能体策略梯度中共享潜在状态信息

Hanhan Zhou,Tian Lan,Vaneet Aggarwal

from arxiv, Revised Version

Value function factorization via centralized training and decentralized execution is promising for solving cooperative multi-agent reinforcement tasks. One of the approaches in this area, QMIX, has become state-of-the-art and achieved the best performance on the StarCraft II micromanagement benchmark. However, the monotonic-mixing of per agent estimates in QMIX is known to restrict the joint action Q-values it can represent, as well as the insufficient global state information for single agent value function estimation, often resulting in suboptimality. To this end, we present LSF-SAC, a novel framework that features a variational inference-based information-sharing mechanism as extra state information to assist individual agents in the value function factorization. We demonstrate that such latent individual state information sharing can significantly expand the power of value function factorization, while fully decentralized execution can still be maintained in LSF-SAC through a soft-actor-critic design. We evaluate LSF-SAC on the StarCraft II micromanagement challenge and demonstrate that it outperforms several state-of-the-art methods in challenging collaborative tasks. We further set extensive ablation studies for locating the key factors accounting for its performance improvements. We believe that this new insight can lead to new local value estimation methods and variational deep learning algorithms. A demo video and code of implementation can be found at https://sites.google.com/view/sacmm.

翻译：价值函数分解通过集中式训练和去中心化执行来解决合作多智能体强化学习问题，已经成为有前途的方法之一。这个领域中一个称为QMIX的方法已成为最先进的技术，通过在StarCraft II微观管理基准测试上取得了最好的性能。然而，QMIX中每个单独智能体价值的单调混合受限制，其性能常常导致次优，因为不足的全局状态信息不足以支持单智能体价值函数的估计。基于这个问题，我们提出了LSF-SAC这一新框架。它采用基于变分推论的信息共享机制，作为附加状态信息来协助个体代理进行价值函数分解，通过软性演员-评论家设计，实现了完全去中心化执行。我们在StarCraft II微观管理挑战中评估了LSF-SAC，并证明它在具有挑战性的合作任务中胜过几个最先进的方法。我们进一步进行了详细的消融研究，以确定其性能提高的关键因素。我们相信这一新的结论可以指导新的局部价值估计方法和变分深度学习算法。实现的演示视频和代码可以在https://sites.google.com/view/sacmm中找到。

0

相关内容

值函数

战术先验知识启发的多智能体双层强化学习

战术先验知识启发的多智能体双层强化学习

专知会员服务

113+阅读 · 2023年5月9日

【NeurIPS2022】协作多智能体强化学习中个体全局最大值的再思考

【NeurIPS2022】协作多智能体强化学习中个体全局最大值的再思考

专知会员服务

36+阅读 · 2022年9月23日

【“大量”智能体的强化学习】《Many-Agent Reinforcement Learning》，327页博士论文，伦敦大学学院（UCL）

【“大量”智能体的强化学习】《Many-Agent Reinforcement Learning》，327页博士论文，伦敦大学学院（UCL）

专知会员服务

118+阅读 · 2022年5月7日

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

专知会员服务

23+阅读 · 2022年3月19日

斯坦福大学最新【强化学习】2022课程，含ppt

斯坦福大学最新【强化学习】2022课程，含ppt

专知会员服务

131+阅读 · 2022年2月27日

【2022新书】强化学习工业应用，408页pdf

【2022新书】强化学习工业应用，408页pdf

专知会员服务

231+阅读 · 2022年2月3日

【NeurIPS 2021】设置多智能体策略梯度的方差

【NeurIPS 2021】设置多智能体策略梯度的方差

专知会员服务

21+阅读 · 2021年10月24日

【KDD2021】基于因果反事实Shapley的MARL信度分配

专知会员服务

19+阅读 · 2021年7月11日

【ICML2020-天津大学】多智能体深度强化学习中的Q值路径分解

【ICML2020-天津大学】多智能体深度强化学习中的Q值路径分解

专知会员服务

81+阅读 · 2020年7月2日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

去中心化多智能体导航的基于模型的强化学习 (RL)

去中心化多智能体导航的基于模型的强化学习 (RL)

TensorFlow

13+阅读 · 2021年6月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

19篇ICML2019论文摘录选读！

19篇ICML2019论文摘录选读！

专知

28+阅读 · 2019年4月28日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【资源】Python强化学习实战，Anaconda公司的高级数据科学家讲解（附相关Python开源库）

【资源】Python强化学习实战，Anaconda公司的高级数据科学家讲解（附相关Python开源库）

专知

13+阅读 · 2017年12月10日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

梯度纳米结构金属的应变硬化行为及微观机理

国家自然科学基金

0+阅读 · 2013年12月31日

合作行为与公共品的有效利用

国家自然科学基金

0+阅读 · 2013年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

电子封装完整性超声无损检测与表征

国家自然科学基金

0+阅读 · 2013年12月31日

建筑物内行人行为特征及疏散机理研究

国家自然科学基金

0+阅读 · 2013年12月31日

三维流形Heegaard分解稳定化问题的研究

国家自然科学基金

0+阅读 · 2012年12月31日

不确定环境下强化学习和决策的神经机制

国家自然科学基金

11+阅读 · 2012年12月31日

突发事件网络舆情演变过程中的人群仿真研究

国家自然科学基金

0+阅读 · 2012年12月31日

镁合金棘轮行为微观机制研究

国家自然科学基金

0+阅读 · 2008年12月31日

基于动态分层与自学习的多智能体自适应协作模型

国家自然科学基金

17+阅读 · 2008年12月31日

Seizing Serendipity: Exploiting the Value of Past Success in Off-Policy Actor-Critic

Arxiv

0+阅读 · 2023年6月6日

Centralized Training with Hybrid Execution in Multi-Agent Reinforcement Learning

Arxiv

0+阅读 · 2023年6月5日

A Unified Framework for Factorizing Distributional Value Functions for Multi-Agent Reinforcement Learning

Arxiv

0+阅读 · 2023年6月4日

Neural Vector Fields: Implicit Representation by Explicit Learning

Arxiv

0+阅读 · 2023年6月3日

IQL-TD-MPC: Implicit Q-Learning for Hierarchical Model Predictive Control

Arxiv

0+阅读 · 2023年6月1日

Combining Explicit and Implicit Regularization for Efficient Learning in Deep Networks

Arxiv

0+阅读 · 2023年6月1日

Distributed Deep Reinforcement Learning: A Survey and A Multi-Player Multi-Agent Learning Toolbox

Arxiv

11+阅读 · 2022年12月1日

Reinforcement Learning on Graph: A Survey

Arxiv

67+阅读 · 2022年4月13日

A Comprehensive Survey and Performance Analysis of Activation Functions in Deep Learning

A Comprehensive Survey and Performance Analysis of Activation Functions in Deep Learning

Arxiv

23+阅读 · 2021年9月29日

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Arxiv

26+阅读 · 2020年2月10日

VIP会员

文章信息

相关主题

相关VIP内容

战术先验知识启发的多智能体双层强化学习

战术先验知识启发的多智能体双层强化学习

专知会员服务

113+阅读 · 2023年5月9日

【NeurIPS2022】协作多智能体强化学习中个体全局最大值的再思考

【NeurIPS2022】协作多智能体强化学习中个体全局最大值的再思考

专知会员服务

36+阅读 · 2022年9月23日

【“大量”智能体的强化学习】《Many-Agent Reinforcement Learning》，327页博士论文，伦敦大学学院（UCL）

【“大量”智能体的强化学习】《Many-Agent Reinforcement Learning》，327页博士论文，伦敦大学学院（UCL）

专知会员服务

118+阅读 · 2022年5月7日

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

专知会员服务

23+阅读 · 2022年3月19日

斯坦福大学最新【强化学习】2022课程，含ppt

斯坦福大学最新【强化学习】2022课程，含ppt

专知会员服务

131+阅读 · 2022年2月27日

【2022新书】强化学习工业应用，408页pdf

【2022新书】强化学习工业应用，408页pdf

专知会员服务

231+阅读 · 2022年2月3日

【NeurIPS 2021】设置多智能体策略梯度的方差

【NeurIPS 2021】设置多智能体策略梯度的方差

专知会员服务

21+阅读 · 2021年10月24日

【KDD2021】基于因果反事实Shapley的MARL信度分配

专知会员服务

19+阅读 · 2021年7月11日

【ICML2020-天津大学】多智能体深度强化学习中的Q值路径分解

【ICML2020-天津大学】多智能体深度强化学习中的Q值路径分解

专知会员服务

81+阅读 · 2020年7月2日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

【伯克利博士论文】通过真实世界实践赋能机器人自主性

军用无人机集群技术尚未成熟——但潜力可期

人工智能安全治理白皮书（2025）

AgentOps综述：分类、挑战与未来方向

相关资讯

去中心化多智能体导航的基于模型的强化学习 (RL)

去中心化多智能体导航的基于模型的强化学习 (RL)

TensorFlow

13+阅读 · 2021年6月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

19篇ICML2019论文摘录选读！

19篇ICML2019论文摘录选读！

专知

28+阅读 · 2019年4月28日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【资源】Python强化学习实战，Anaconda公司的高级数据科学家讲解（附相关Python开源库）

【资源】Python强化学习实战，Anaconda公司的高级数据科学家讲解（附相关Python开源库）

专知

13+阅读 · 2017年12月10日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Seizing Serendipity: Exploiting the Value of Past Success in Off-Policy Actor-Critic

Arxiv

0+阅读 · 2023年6月6日

Centralized Training with Hybrid Execution in Multi-Agent Reinforcement Learning

Arxiv

0+阅读 · 2023年6月5日

A Unified Framework for Factorizing Distributional Value Functions for Multi-Agent Reinforcement Learning

Arxiv

0+阅读 · 2023年6月4日

Neural Vector Fields: Implicit Representation by Explicit Learning

Arxiv

0+阅读 · 2023年6月3日

IQL-TD-MPC: Implicit Q-Learning for Hierarchical Model Predictive Control

Arxiv

0+阅读 · 2023年6月1日

Combining Explicit and Implicit Regularization for Efficient Learning in Deep Networks

Arxiv

0+阅读 · 2023年6月1日

Distributed Deep Reinforcement Learning: A Survey and A Multi-Player Multi-Agent Learning Toolbox

Arxiv

11+阅读 · 2022年12月1日

Reinforcement Learning on Graph: A Survey

Arxiv

67+阅读 · 2022年4月13日

A Comprehensive Survey and Performance Analysis of Activation Functions in Deep Learning

A Comprehensive Survey and Performance Analysis of Activation Functions in Deep Learning

Arxiv

23+阅读 · 2021年9月29日

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Arxiv

26+阅读 · 2020年2月10日

相关基金

梯度纳米结构金属的应变硬化行为及微观机理

国家自然科学基金

0+阅读 · 2013年12月31日

合作行为与公共品的有效利用

国家自然科学基金

0+阅读 · 2013年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

电子封装完整性超声无损检测与表征

国家自然科学基金

0+阅读 · 2013年12月31日

建筑物内行人行为特征及疏散机理研究

国家自然科学基金

0+阅读 · 2013年12月31日

三维流形Heegaard分解稳定化问题的研究

国家自然科学基金

0+阅读 · 2012年12月31日

不确定环境下强化学习和决策的神经机制

国家自然科学基金

11+阅读 · 2012年12月31日

突发事件网络舆情演变过程中的人群仿真研究

国家自然科学基金

0+阅读 · 2012年12月31日

镁合金棘轮行为微观机制研究

国家自然科学基金

0+阅读 · 2008年12月31日

基于动态分层与自学习的多智能体自适应协作模型

国家自然科学基金

17+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员