PPP在合作、多机构运动会中的惊人效果 (The Surprising Effectiveness of PPO in Cooperative, Multi-Agent Games) - 专知论文

会员服务 ·

0

Less · Performer · tuning · 学成 · Performance ·

2021 年 7 月 5 日

The Surprising Effectiveness of PPO in Cooperative, Multi-Agent Games

翻译：PPP在合作、多机构运动会中的惊人效果

Chao Yu,Akash Velu,Eugene Vinitsky,Yu Wang,Alexandre Bayen,Yi Wu

Proximal Policy Optimization (PPO) is a popular on-policy reinforcement learning algorithm but is significantly less utilized than off-policy learning algorithms in multi-agent settings. This is often due the belief that on-policy methods are significantly less sample efficient than their off-policy counterparts in multi-agent problems. In this work, we investigate Multi-Agent PPO (MAPPO), a variant of PPO which is specialized for multi-agent settings. Using a 1-GPU desktop, we show that MAPPO achieves surprisingly strong performance in three popular multi-agent testbeds: the particle-world environments, the Starcraft multi-agent challenge, and the Hanabi challenge, with minimal hyperparameter tuning and without any domain-specific algorithmic modifications or architectures. In the majority of environments, we find that compared to off-policy baselines, MAPPO achieves strong results while exhibiting comparable sample efficiency. Finally, through ablation studies, we present the implementation and algorithmic factors which are most influential to MAPPO's practical performance.

翻译：最佳政策优化(PPO)是一种受欢迎的政策强化学习算法,但在多试剂环境中的利用远远少于非政策学习算法,这往往是因为相信在多试剂问题上,政策性方法的抽样效率大大低于非政策性对应方。在这项工作中,我们调查多剂政策优化(MAPO)的变种PPPO(MAPO),该变种专门用于多剂性能。我们使用1-GPU桌面,显示MAPO在三种受欢迎的多剂试样中取得了惊人的强效性能:粒子-世界环境、星体多剂挑战以及汉纳比挑战,只有极少的超参数调整,而且没有任何特定领域的算法修改或结构。在大多数环境中,我们发现与非政策基线相比,MAPOPO在展示可比的样本效率的同时取得了显著的成果。最后,我们通过对比研究,介绍了对MAPPO的实际业绩最有影响力的执行和算法因素。

0

相关内容

Less

LESS 是一个开源的样式语言，受到 Sass 的影响。严格来说，LESS 是一个嵌套的元语言，符合语法规范的 CSS 语句也是符合规范的 Less 代码。

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

53+阅读 · 2020年9月7日

策略梯度方法的算子视图，An operator view of policy gradient methods

策略梯度方法的算子视图，An operator view of policy gradient methods

专知会员服务

11+阅读 · 2020年6月23日

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

专知会员服务

131+阅读 · 2020年4月19日

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

专知会员服务

41+阅读 · 2020年4月11日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【AAAI 2019 Tutorial】城市交通控制的规划与调度方法（Planning and Scheduling Approaches for Urban Traffic Control），Scott Sanner，Mauro Vallati，Stephen F. Smith

【AAAI 2019 Tutorial】城市交通控制的规划与调度方法（Planning and Scheduling Approaches for Urban Traffic Control），Scott Sanner，Mauro Vallati，Stephen F. Smith

专知会员服务

8+阅读 · 2019年11月18日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

灾难性遗忘问题新视角：迁移-干扰平衡

灾难性遗忘问题新视角：迁移-干扰平衡

CreateAMind

17+阅读 · 2019年7月6日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

腊月廿八 | 强化学习-TRPO和PPO背后的数学

腊月廿八 | 强化学习-TRPO和PPO背后的数学

AI研习社

18+阅读 · 2019年2月2日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

【强化学习】用于真实机器人的高效深度强化学习算法、全面解读深度强化学习

【强化学习】用于真实机器人的高效深度强化学习算法、全面解读深度强化学习

产业智能官

16+阅读 · 2018年12月27日

spinningup.openai 强化学习资源完整

spinningup.openai 强化学习资源完整

CreateAMind

6+阅读 · 2018年12月17日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

Policy Perturbation via Noisy Advantage Values for Cooperative Multi-agent Actor-Critic methods

Policy Perturbation via Noisy Advantage Values for Cooperative Multi-agent Actor-Critic methods

Arxiv

0+阅读 · 2021年9月8日

A Comparative Study of Nonlinear MPC and Differential-Flatness-Based Control for Quadrotor Agile Flight

A Comparative Study of Nonlinear MPC and Differential-Flatness-Based Control for Quadrotor Agile Flight

Arxiv

0+阅读 · 2021年9月6日

Multi-agent online learning in time-varying games

Arxiv

0+阅读 · 2021年9月4日

Multi-Agent Inverse Reinforcement Learning: Suboptimal Demonstrations and Alternative Solution Concepts

Arxiv

0+阅读 · 2021年9月2日

Density Constrained Reinforcement Learning

Arxiv

6+阅读 · 2021年6月24日

Multi-Agent Cooperative Bidding Games for Multi-Objective Optimization in e-Commercial Sponsored Search

Arxiv

12+阅读 · 2021年6月8日

Pipeline PSRO: A Scalable Approach for Finding Approximate Nash Equilibria in Large Games

Arxiv

3+阅读 · 2020年6月15日

The StarCraft Multi-Agent Challenge

The StarCraft Multi-Agent Challenge

Arxiv

3+阅读 · 2019年2月11日

Emergent Translation in Multi-Agent Communication

Arxiv

3+阅读 · 2018年4月11日

Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments

Arxiv

6+阅读 · 2018年1月16日

VIP会员

文章信息

相关主题

相关VIP内容

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

53+阅读 · 2020年9月7日

策略梯度方法的算子视图，An operator view of policy gradient methods

策略梯度方法的算子视图，An operator view of policy gradient methods

专知会员服务

11+阅读 · 2020年6月23日

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

专知会员服务

131+阅读 · 2020年4月19日

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

专知会员服务

41+阅读 · 2020年4月11日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【AAAI 2019 Tutorial】城市交通控制的规划与调度方法（Planning and Scheduling Approaches for Urban Traffic Control），Scott Sanner，Mauro Vallati，Stephen F. Smith

【AAAI 2019 Tutorial】城市交通控制的规划与调度方法（Planning and Scheduling Approaches for Urban Traffic Control），Scott Sanner，Mauro Vallati，Stephen F. Smith

专知会员服务

8+阅读 · 2019年11月18日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

新书册《几何深度学习的数学基础》

中程单向攻击无人机的战略意义：俄乌战争启示

在无标注条件下适配视觉—语言模型：全面综述

面向视觉语言模型的持续学习：遗忘之外的综述与分类体系

相关资讯

灾难性遗忘问题新视角：迁移-干扰平衡

灾难性遗忘问题新视角：迁移-干扰平衡

CreateAMind

17+阅读 · 2019年7月6日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

腊月廿八 | 强化学习-TRPO和PPO背后的数学

腊月廿八 | 强化学习-TRPO和PPO背后的数学

AI研习社

18+阅读 · 2019年2月2日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

【强化学习】用于真实机器人的高效深度强化学习算法、全面解读深度强化学习

【强化学习】用于真实机器人的高效深度强化学习算法、全面解读深度强化学习

产业智能官

16+阅读 · 2018年12月27日

spinningup.openai 强化学习资源完整

spinningup.openai 强化学习资源完整

CreateAMind

6+阅读 · 2018年12月17日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Policy Perturbation via Noisy Advantage Values for Cooperative Multi-agent Actor-Critic methods

Policy Perturbation via Noisy Advantage Values for Cooperative Multi-agent Actor-Critic methods

Arxiv

0+阅读 · 2021年9月8日

A Comparative Study of Nonlinear MPC and Differential-Flatness-Based Control for Quadrotor Agile Flight

A Comparative Study of Nonlinear MPC and Differential-Flatness-Based Control for Quadrotor Agile Flight

Arxiv

0+阅读 · 2021年9月6日

Multi-agent online learning in time-varying games

Arxiv

0+阅读 · 2021年9月4日

Multi-Agent Inverse Reinforcement Learning: Suboptimal Demonstrations and Alternative Solution Concepts

Arxiv

0+阅读 · 2021年9月2日

Density Constrained Reinforcement Learning

Arxiv

6+阅读 · 2021年6月24日

Multi-Agent Cooperative Bidding Games for Multi-Objective Optimization in e-Commercial Sponsored Search

Arxiv

12+阅读 · 2021年6月8日

Pipeline PSRO: A Scalable Approach for Finding Approximate Nash Equilibria in Large Games

Arxiv

3+阅读 · 2020年6月15日

The StarCraft Multi-Agent Challenge

The StarCraft Multi-Agent Challenge

Arxiv

3+阅读 · 2019年2月11日

Emergent Translation in Multi-Agent Communication

Arxiv

3+阅读 · 2018年4月11日

Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments

Arxiv

6+阅读 · 2018年1月16日

微信扫码咨询专知VIP会员