半集中化多机构加强机构学习,提供政策培训 (Semi-Centralised Multi-Agent Reinforcement Learning with Policy-Embedded Training) - 专知论文

会员服务 ·

0

Learning · Agent · Performer · 强化学习 · 可约的 ·

2022 年 9 月 2 日

Semi-Centralised Multi-Agent Reinforcement Learning with Policy-Embedded Training

翻译：半集中化多机构加强机构学习,提供政策培训

Taher Jafferjee,Juliusz Ziomek,Tianpei Yang,Zipeng Dai,Jianhong Wang,Matthew Taylor,Kun Shao,Jun Wang,David Mguni

Centralised training (CT) is the basis for many popular multi-agent reinforcement learning (MARL) methods because it allows agents to quickly learn high-performing policies. However, CT relies on agents learning from one-off observations of other agents' actions at a given state. Because MARL agents explore and update their policies during training, these observations often provide poor predictions about other agents' behaviour and the expected return for a given action. CT methods therefore suffer from high variance and error-prone estimates, harming learning. CT methods also suffer from explosive growth in complexity due to the reliance on global observations, unless strong factorisation restrictions are imposed (e.g., monotonic reward functions for QMIX). We address these challenges with a new semi-centralised MARL framework that performs policy-embedded training and decentralised execution. Our method, policy embedded reinforcement learning algorithm (PERLA), is an enhancement tool for Actor-Critic MARL algorithms that leverages a novel parameter sharing protocol and policy embedding method to maintain estimates that account for other agents' behaviour. Our theory proves PERLA dramatically reduces the variance in value estimates. Unlike various CT methods, PERLA, which seamlessly adopts MARL algorithms, scales easily with the number of agents without the need for restrictive factorisation assumptions. We demonstrate PERLA's superior empirical performance and efficient scaling in benchmark environments including StarCraft Micromanagement II and Multi-agent Mujoco

翻译：中央化培训是许多受欢迎的多剂强化学习方法的基础,因为这种方法使代理商能够迅速学习高绩效政策;然而,中央化培训依赖代理商学习特定国家其他代理商行为的一次性观察。由于MAR代理商在培训期间探索并更新其政策,这些观测往往对其他代理商的行为和某一行动的预期回报预测不佳。因此,CT方法存在差异和易出错的高度估计,导致学习受到损害。由于依赖全球观察,CT方法也因复杂性的爆炸性增长而受到影响,除非施加强大的因素化限制(例如QMIX的单一奖励功能)。我们用一个新的半中央化的MARL框架来应对这些挑战,这个框架进行政策整合培训和分散执行。我们的方法,即政策嵌入的强化学习算法(PERLA),是利用新的参数共享协议和政策嵌入方法来维持其他代理商行为的估计。我们的理论证明,PERLA(包括无止损的A级的高级估定值),而我们没有采用各种标准级化标准。

0

相关内容

Learning

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

灾难性遗忘问题新视角：迁移-干扰平衡

灾难性遗忘问题新视角：迁移-干扰平衡

CreateAMind

17+阅读 · 2019年7月6日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Reinforcement Learning: An Introduction 2018第二版 500页

Reinforcement Learning: An Introduction 2018第二版 500页

CreateAMind

14+阅读 · 2018年4月27日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

砷诱导的炎性因子通过TGF-β/Smad和microRNA调控膀胱上皮细胞EMT的机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

雷公藤甲素诱导急性早幼粒白血病细胞凋亡及自噬的机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

中国田鼠亚科 Microtini族(Rodentia: Cricetidae: Arvicolinae)的分类与系统发育研究

国家自然科学基金

0+阅读 · 2014年12月31日

MicroRNA调控BACE1在AD发病中的作用与机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

棉铃虫性信息素腺体ACCase基因的克隆及功能分析

国家自然科学基金

0+阅读 · 2013年12月31日

基于荧光共振能量转移机制的微流激光研究

国家自然科学基金

0+阅读 · 2013年12月31日

长链非编码RNA-Cyren与舌鳞癌预后的关系及调控机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

实值多变量维数约简研究及应用

国家自然科学基金

0+阅读 · 2012年12月31日

lincRNA在苯并(a)芘诱发肺癌变中的作用

国家自然科学基金

0+阅读 · 2011年12月31日

可逆黏附微观机制的仿生研究

国家自然科学基金

0+阅读 · 2009年12月31日

Rethinking Value Function Learning for Generalization in Reinforcement Learning

Arxiv

0+阅读 · 2022年10月18日

Anchor-Changing Regularized Natural Policy Gradient for Multi-Objective Reinforcement Learning

Anchor-Changing Regularized Natural Policy Gradient for Multi-Objective Reinforcement Learning

Arxiv

0+阅读 · 2022年10月18日

Receding Horizon Inverse Reinforcement Learning

Arxiv

0+阅读 · 2022年10月17日

PTDE: Personalized Training with Distillated Execution for Multi-Agent Reinforcement Learning

Arxiv

0+阅读 · 2022年10月17日

Data-Efficient Pipeline for Offline Reinforcement Learning with Limited Data

Arxiv

0+阅读 · 2022年10月16日

Influencing Long-Term Behavior in Multiagent Reinforcement Learning

Arxiv

0+阅读 · 2022年10月15日

Deep Reinforcement Learning-based Rebalancing Policies for Profit Maximization of Relay Nodes in Payment Channel Networks

Arxiv

0+阅读 · 2022年10月13日

Reinforcement Learning on Graph: A Survey

Arxiv

67+阅读 · 2022年4月13日

Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey

Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey

Arxiv

20+阅读 · 2020年3月10日

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Arxiv

26+阅读 · 2020年2月10日

VIP会员

文章信息

相关主题

相关VIP内容

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《美陆军徒步机动作战条令手册》最新168页

【博士论文】基于不确定性的可靠性：现代机器学习中的选择性预测与可信部署

军事后勤数字化未来展望

《美海军后勤体系整合与创新挑战》最新报告

相关资讯

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

灾难性遗忘问题新视角：迁移-干扰平衡

灾难性遗忘问题新视角：迁移-干扰平衡

CreateAMind

17+阅读 · 2019年7月6日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Reinforcement Learning: An Introduction 2018第二版 500页

Reinforcement Learning: An Introduction 2018第二版 500页

CreateAMind

14+阅读 · 2018年4月27日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Rethinking Value Function Learning for Generalization in Reinforcement Learning

Arxiv

0+阅读 · 2022年10月18日

Anchor-Changing Regularized Natural Policy Gradient for Multi-Objective Reinforcement Learning

Anchor-Changing Regularized Natural Policy Gradient for Multi-Objective Reinforcement Learning

Arxiv

0+阅读 · 2022年10月18日

Receding Horizon Inverse Reinforcement Learning

Arxiv

0+阅读 · 2022年10月17日

PTDE: Personalized Training with Distillated Execution for Multi-Agent Reinforcement Learning

Arxiv

0+阅读 · 2022年10月17日

Data-Efficient Pipeline for Offline Reinforcement Learning with Limited Data

Arxiv

0+阅读 · 2022年10月16日

Influencing Long-Term Behavior in Multiagent Reinforcement Learning

Arxiv

0+阅读 · 2022年10月15日

Deep Reinforcement Learning-based Rebalancing Policies for Profit Maximization of Relay Nodes in Payment Channel Networks

Arxiv

0+阅读 · 2022年10月13日

Reinforcement Learning on Graph: A Survey

Arxiv

67+阅读 · 2022年4月13日

Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey

Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey

Arxiv

20+阅读 · 2020年3月10日

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Arxiv

26+阅读 · 2020年2月10日

相关基金

砷诱导的炎性因子通过TGF-β/Smad和microRNA调控膀胱上皮细胞EMT的机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

雷公藤甲素诱导急性早幼粒白血病细胞凋亡及自噬的机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

中国田鼠亚科 Microtini族(Rodentia: Cricetidae: Arvicolinae)的分类与系统发育研究

国家自然科学基金

0+阅读 · 2014年12月31日

MicroRNA调控BACE1在AD发病中的作用与机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

棉铃虫性信息素腺体ACCase基因的克隆及功能分析

国家自然科学基金

0+阅读 · 2013年12月31日

基于荧光共振能量转移机制的微流激光研究

国家自然科学基金

0+阅读 · 2013年12月31日

长链非编码RNA-Cyren与舌鳞癌预后的关系及调控机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

实值多变量维数约简研究及应用

国家自然科学基金

0+阅读 · 2012年12月31日

lincRNA在苯并(a)芘诱发肺癌变中的作用

国家自然科学基金

0+阅读 · 2011年12月31日

可逆黏附微观机制的仿生研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员