RIIT:重新考虑执行技巧在多机构加强学习中的重要性 (RIIT: Rethinking the Importance of Implementation Tricks in Multi-Agent Reinforcement Learning) - 专知论文

会员服务 ·

0

SOTA · 强化学习 · 学成 · state-of-the-art · 约束 ·

2021 年 4 月 19 日

RIIT: Rethinking the Importance of Implementation Tricks in Multi-Agent Reinforcement Learning

翻译：RIIT:重新考虑执行技巧在多机构加强学习中的重要性

Jian Hu,Siyang Jiang,Seth Austin Harding,Haibin Wu,Shih-wei Liao

from arxiv, oral talk at UMD RLSS; add a theory of monotonic mixing network

Recent years have seen revolutionary breakthroughs in the field of Multi-Agent Deep Reinforcement Learning (MADRL), with its successful applications to various complex scenarios such as computer games and robot swarms. We investigate the impact of "implementation tricks" of state-of-the-art (SOTA) QMIX-based algorithms. First, we find that applied tricks that are described as auxiliary details to the core algorithm, seemingly of secondary importance, in fact, have an enormous impact on performance. Our finding demonstrates that, after minimal tuning, QMIX attains extraordinarily high win rates and achieves SOTA in the StarCraft Multi-Agent Challenge (SMAC). Furthermore, we find QMIX's monotonicity constraint improves sample efficiency in certain cooperative tasks. We propose a new policy-based algorithm to verify the importance of the monotonicity constraint: RIIT. RIIT successfully achieves SOTA in policy-based algorithms. Finally, we prove that the Purely Cooperative Tasks can be represented by the monotonic mixing networks. We open-source the code at \url{https://github.com/hijkzzz/pymarl2}.

翻译：近些年来,在多代理深层强化学习(MADRL)领域出现了革命性突破,在计算机游戏和机器人群等各种复杂情景中成功地应用了这种革命性突破。我们调查了基于QMIX的算法“应用技巧”的影响。首先,我们发现,在核心算法中被描述为辅助细节的运用技巧实际上似乎具有次要重要性,对业绩产生了巨大影响。我们的发现表明,在微调后,QMIX取得了超高的赢率,并在StarCraft多位挑战(SMAC)中实现了SOTA。此外,我们发现QMIX的单一性约束提高了某些合作任务的样本效率。我们提出了一个新的基于政策性的算法,以核实单一性制约的重要性:RIIT. RIIT成功地在基于政策的算法中实现了SOTA。最后,我们证明,Purely合作任务可以由单调混合网络来代表。WEOFORL2/QUMARM}/SUGUMARMS。

0

相关内容

SOTA

【ICML2020-伯克利】稳定非策略强化学习的表示，Representations for Stable Off-Policy Reinforcement Learning

【ICML2020-伯克利】稳定非策略强化学习的表示，Representations for Stable Off-Policy Reinforcement Learning

专知会员服务

17+阅读 · 2020年7月14日

策略梯度方法的算子视图，An operator view of policy gradient methods

策略梯度方法的算子视图，An operator view of policy gradient methods

专知会员服务

11+阅读 · 2020年6月23日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

专知会员服务

84+阅读 · 2020年2月18日

【课程推荐】深度学习中的几何（Geometry of Deep Learning）

【课程推荐】深度学习中的几何（Geometry of Deep Learning）

专知会员服务

59+阅读 · 2019年11月10日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

腊月廿八 | 强化学习-TRPO和PPO背后的数学

腊月廿八 | 强化学习-TRPO和PPO背后的数学

AI研习社

18+阅读 · 2019年2月2日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Reinforcement Learning: An Introduction 2018第二版 500页

Reinforcement Learning: An Introduction 2018第二版 500页

CreateAMind

14+阅读 · 2018年4月27日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

读论文Discriminative Deep Metric Learning for Face and KV

读论文Discriminative Deep Metric Learning for Face and KV

统计学习与视觉计算组

12+阅读 · 2018年4月6日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

UneVEn: Universal Value Exploration for Multi-Agent Reinforcement Learning

UneVEn: Universal Value Exploration for Multi-Agent Reinforcement Learning

Arxiv

0+阅读 · 2021年6月10日

Objective Robustness in Deep Reinforcement Learning

Arxiv

0+阅读 · 2021年6月8日

Towards Practical Credit Assignment for Deep Reinforcement Learning

Arxiv

0+阅读 · 2021年6月8日

SoftDICE for Imitation Learning: Rethinking Off-policy Distribution Matching

Arxiv

0+阅读 · 2021年6月6日

Simple and effective localized attribute representations for zero-shot learning

Simple and effective localized attribute representations for zero-shot learning

Arxiv

5+阅读 · 2020年6月10日

Revealing the Dark Secrets of BERT

Revealing the Dark Secrets of BERT

Arxiv

4+阅读 · 2019年9月11日

The StarCraft Multi-Agent Challenge

The StarCraft Multi-Agent Challenge

Arxiv

3+阅读 · 2019年2月11日

Learning to Walk via Deep Reinforcement Learning

Arxiv

7+阅读 · 2018年12月26日

CIRL: Controllable Imitative Reinforcement Learning for Vision-based Self-driving

CIRL: Controllable Imitative Reinforcement Learning for Vision-based Self-driving

Arxiv

8+阅读 · 2018年7月10日

Scalable Angular Discriminative Deep Metric Learning for Face Recognition

Arxiv

4+阅读 · 2018年5月1日

VIP会员

文章信息

相关主题

state-of-the-art

相关VIP内容

【ICML2020-伯克利】稳定非策略强化学习的表示，Representations for Stable Off-Policy Reinforcement Learning

【ICML2020-伯克利】稳定非策略强化学习的表示，Representations for Stable Off-Policy Reinforcement Learning

专知会员服务

17+阅读 · 2020年7月14日

策略梯度方法的算子视图，An operator view of policy gradient methods

策略梯度方法的算子视图，An operator view of policy gradient methods

专知会员服务

11+阅读 · 2020年6月23日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

专知会员服务

84+阅读 · 2020年2月18日

【课程推荐】深度学习中的几何（Geometry of Deep Learning）

【课程推荐】深度学习中的几何（Geometry of Deep Learning）

专知会员服务

59+阅读 · 2019年11月10日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

《战区安全决策课程体系》最新244页

《"无人机航母"原型平台》

任务规划与地形分析：现代复杂环境作战导航体系

《攻击场景描述形式化模型研究》

相关资讯

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

腊月廿八 | 强化学习-TRPO和PPO背后的数学

腊月廿八 | 强化学习-TRPO和PPO背后的数学

AI研习社

18+阅读 · 2019年2月2日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Reinforcement Learning: An Introduction 2018第二版 500页

Reinforcement Learning: An Introduction 2018第二版 500页

CreateAMind

14+阅读 · 2018年4月27日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

读论文Discriminative Deep Metric Learning for Face and KV

读论文Discriminative Deep Metric Learning for Face and KV

统计学习与视觉计算组

12+阅读 · 2018年4月6日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

UneVEn: Universal Value Exploration for Multi-Agent Reinforcement Learning

UneVEn: Universal Value Exploration for Multi-Agent Reinforcement Learning

Arxiv

0+阅读 · 2021年6月10日

Objective Robustness in Deep Reinforcement Learning

Arxiv

0+阅读 · 2021年6月8日

Towards Practical Credit Assignment for Deep Reinforcement Learning

Arxiv

0+阅读 · 2021年6月8日

SoftDICE for Imitation Learning: Rethinking Off-policy Distribution Matching

Arxiv

0+阅读 · 2021年6月6日

Simple and effective localized attribute representations for zero-shot learning

Simple and effective localized attribute representations for zero-shot learning

Arxiv

5+阅读 · 2020年6月10日

Revealing the Dark Secrets of BERT

Revealing the Dark Secrets of BERT

Arxiv

4+阅读 · 2019年9月11日

The StarCraft Multi-Agent Challenge

The StarCraft Multi-Agent Challenge

Arxiv

3+阅读 · 2019年2月11日

Learning to Walk via Deep Reinforcement Learning

Arxiv

7+阅读 · 2018年12月26日

CIRL: Controllable Imitative Reinforcement Learning for Vision-based Self-driving

CIRL: Controllable Imitative Reinforcement Learning for Vision-based Self-driving

Arxiv

8+阅读 · 2018年7月10日

Scalable Angular Discriminative Deep Metric Learning for Face Recognition

Arxiv

4+阅读 · 2018年5月1日

微信扫码咨询专知VIP会员