通过合作性多剂行为者-批评方法的 " 闻名 " 有利价值影响政策 (Policy Perturbation via Noisy Advantage Values for Cooperative Multi-agent Actor-Critic methods) - 专知论文

会员服务 ·

0

Performer · 相互独立的 · 优化器 · 价值函数 · Better ·

2021 年 9 月 8 日

Policy Perturbation via Noisy Advantage Values for Cooperative Multi-agent Actor-Critic methods

翻译：通过合作性多剂行为者-批评方法的 " 闻名 " 有利价值影响政策

Jian Hu,Siyue Hu,Shih-wei Liao

from arxiv, update

Recent works have applied the Proximal Policy Optimization (PPO) to the multi-agent cooperative tasks, such as Independent PPO (IPPO); and vanilla Multi-agent PPO (MAPPO) which has a centralized value function. However, previous literature shows that MAPPO may not perform as well as Independent PPO (IPPO) and the Fine-tuned QMIX on Starcraft Multi-Agent Challenge (SMAC). MAPPO-Feature-Pruned (MAPPO-FP) improves the performance of MAPPO by the carefully designed agent-specific features. In addition, there is no literature that gives a theoretical analysis of the working mechanism of MAPPO. In this paper, we firstly theoretically generalize single-agent PPO to the MAPPO, which shows that the MAPPO is approximately equivalent to optimizing a multi-agent joint policy with the original PPO. Secondly, we find that MAPPO faces the problem of \textit{The Policies Overfitting in Multi-agent Cooperation(POMAC)}, as they learn policies by the sampled centralized advantage values. Then POMAC may lead to updating the multi-agent policies in a suboptimal direction and prevent the agents from exploring better trajectories. To solve this problem, we propose two novel policy perturbation methods, i.e, Noisy-Value MAPPO (NV-MAPPO) and Noisy-Advantage MAPPO (NA-MAPPO), which disturb the advantage values via random Gaussian noise. The experimental results show that our methods without agent-specific features outperform the Fine-tuned QMIX, MAPPO-FP, and achieves SOTA on SMAC. We open-source the code at \url{https://github.com/hijkzzz/noisy-mappo}.

翻译：最近的著作应用了Proximal 政策优化(PPO) 来完成多试剂合作任务,例如独立 PPO(IPPO) 和具有集中值功能的香草多试 PPO(MAPO) 。但是,以前的文献显示,MAPO可能不会像独立PPPO(IPPO) 和关于星际车道多点挑战(SMAAC) 的微调 QMIX 。 MAPPO- Fater-Pruned (MAPO-PFP) 以精心设计的代理特有特点改进了MAPO的性能。此外,没有任何文献对MAPO的工作机制进行理论分析。在本文中,我们首先理论上将单剂PAPPO(PO) 普遍化为MAPO(IPPO), 这相当于优化与原始PPPO的多点联合政策。第二,我们发现MAPO(W) 面临\ 公开性 (PO- 政策在多试剂合作(PO-MACT) 上过度调整政策。

0

相关内容

Performer

【NeurIPS 2021】设置多智能体策略梯度的方差

【NeurIPS 2021】设置多智能体策略梯度的方差

专知会员服务

21+阅读 · 2021年10月24日

【因果基础】Causality Basics，36页ppt

专知会员服务

52+阅读 · 2021年8月8日

策略梯度方法的算子视图，An operator view of policy gradient methods

策略梯度方法的算子视图，An operator view of policy gradient methods

专知会员服务

11+阅读 · 2020年6月23日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

111+阅读 · 2020年5月15日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【CoRL2019最佳论文】模仿学习，A Divergence Minimization Perspective on Imitation Learning Methods

【CoRL2019最佳论文】模仿学习，A Divergence Minimization Perspective on Imitation Learning Methods

专知会员服务

24+阅读 · 2019年11月11日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

强化学习扫盲贴：从Q-learning到DQN

强化学习扫盲贴：从Q-learning到DQN

夕小瑶的卖萌屋

52+阅读 · 2019年10月13日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

OpenAI官方发布：强化学习中的关键论文

OpenAI官方发布：强化学习中的关键论文

专知

14+阅读 · 2018年12月12日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

Hindsight Goal Ranking on Replay Buffer for Sparse Reward Environment

Arxiv

0+阅读 · 2021年10月28日

Off-policy Reinforcement Learning with Optimistic Exploration and Distribution Correction

Arxiv

0+阅读 · 2021年10月27日

Information-theoretic generalization bounds for black-box learning algorithms

Arxiv

12+阅读 · 2021年10月4日

Settling the Variance of Multi-Agent Policy Gradients

Arxiv

8+阅读 · 2021年8月20日

Shapley Counterfactual Credits for Multi-Agent Reinforcement Learning

Arxiv

7+阅读 · 2021年6月22日

Residual Policy Learning

Residual Policy Learning

Arxiv

4+阅读 · 2018年12月15日

Variational Bayesian Reinforcement Learning with Regret Bounds

Arxiv

3+阅读 · 2018年7月25日

QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning

Arxiv

6+阅读 · 2018年3月30日

Parameter Space Noise for Exploration

Arxiv

3+阅读 · 2018年1月31日

Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments

Arxiv

6+阅读 · 2018年1月16日

VIP会员

文章信息

相关主题

相互独立的

相关VIP内容

【NeurIPS 2021】设置多智能体策略梯度的方差

【NeurIPS 2021】设置多智能体策略梯度的方差

专知会员服务

21+阅读 · 2021年10月24日

【因果基础】Causality Basics，36页ppt

专知会员服务

52+阅读 · 2021年8月8日

策略梯度方法的算子视图，An operator view of policy gradient methods

策略梯度方法的算子视图，An operator view of policy gradient methods

专知会员服务

11+阅读 · 2020年6月23日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

111+阅读 · 2020年5月15日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【CoRL2019最佳论文】模仿学习，A Divergence Minimization Perspective on Imitation Learning Methods

【CoRL2019最佳论文】模仿学习，A Divergence Minimization Perspective on Imitation Learning Methods

专知会员服务

24+阅读 · 2019年11月11日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

新书册《几何深度学习的数学基础》

中程单向攻击无人机的战略意义：俄乌战争启示

在无标注条件下适配视觉—语言模型：全面综述

面向视觉语言模型的持续学习：遗忘之外的综述与分类体系

相关资讯

强化学习扫盲贴：从Q-learning到DQN

强化学习扫盲贴：从Q-learning到DQN

夕小瑶的卖萌屋

52+阅读 · 2019年10月13日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

OpenAI官方发布：强化学习中的关键论文

OpenAI官方发布：强化学习中的关键论文

专知

14+阅读 · 2018年12月12日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

Hindsight Goal Ranking on Replay Buffer for Sparse Reward Environment

Arxiv

0+阅读 · 2021年10月28日

Off-policy Reinforcement Learning with Optimistic Exploration and Distribution Correction

Arxiv

0+阅读 · 2021年10月27日

Information-theoretic generalization bounds for black-box learning algorithms

Arxiv

12+阅读 · 2021年10月4日

Settling the Variance of Multi-Agent Policy Gradients

Arxiv

8+阅读 · 2021年8月20日

Shapley Counterfactual Credits for Multi-Agent Reinforcement Learning

Arxiv

7+阅读 · 2021年6月22日

Residual Policy Learning

Residual Policy Learning

Arxiv

4+阅读 · 2018年12月15日

Variational Bayesian Reinforcement Learning with Regret Bounds

Arxiv

3+阅读 · 2018年7月25日

QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning

Arxiv

6+阅读 · 2018年3月30日

Parameter Space Noise for Exploration

Arxiv

3+阅读 · 2018年1月31日

Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments

Arxiv

6+阅读 · 2018年1月16日

微信扫码咨询专知VIP会员