更集中化培训,仍然权力下放的执行:多机构有条件政策因素化 (More Centralized Training, Still Decentralized Execution: Multi-Agent Conditional Policy Factorization) - 专知论文

会员服务 ·

0

分解的 · Agent · Learning · 优化器 · 相互独立的 ·

2022 年 9 月 26 日

More Centralized Training, Still Decentralized Execution: Multi-Agent Conditional Policy Factorization

翻译：更集中化培训,仍然权力下放的执行:多机构有条件政策因素化

Jiangxing Wang,Deheng Ye,Zongqing Lu

from arxiv, 18 pages

In cooperative multi-agent reinforcement learning (MARL), combining value decomposition with actor-critic enables agents to learn stochastic policies, which are more suitable for the partially observable environment. Given the goal of learning local policies that enable decentralized execution, agents are commonly assumed to be independent of each other, even in centralized training. However, such an assumption may prohibit agents from learning the optimal joint policy. To address this problem, we explicitly take the dependency among agents into centralized training. Although this leads to the optimal joint policy, it may not be factorized for decentralized execution. Nevertheless, we theoretically show that from such a joint policy, we can always derive another joint policy that achieves the same optimality but can be factorized for decentralized execution. To this end, we propose multi-agent conditional policy factorization (MACPF), which takes more centralized training but still enables decentralized execution. We empirically verify MACPF in various cooperative MARL tasks and demonstrate that MACPF achieves better performance or faster convergence than baselines.

翻译：在合作性多试剂强化学习(MARL)中,将价值分解与行为者-批评相结合,使代理商能够学习更适合部分可观测环境的随机政策。鉴于学习有助于分散执行的地方政策的目标,通常认为代理商彼此独立,甚至在集中培训中也是如此。然而,这种假设可能禁止代理商学习最佳的联合政策。为了解决这一问题,我们明确将代理商之间的依赖性纳入集中培训。虽然这会导致最佳的联合政策,但可能不考虑分散执行。然而,我们理论上表明,通过这种联合政策,我们总是可以产生另一个联合政策,实现同样的最佳性,但可以分权执行的因子化。为此,我们提议多剂有条件的政策因子化(MACPF),这需要更集中的培训,但仍然能够分散执行。我们从经验上核查MARL各项合作性任务中的MACPF,并证明MACPF取得比基线更好的业绩或更快的趋同。

0

相关内容

分解的

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

125+阅读 · 2022年4月21日

【硬核课】机器人学习课程，UT Austin朱玉可博士讲述自主机器人的人工智能与机器学习机器学习算法

【硬核课】机器人学习课程，UT Austin朱玉可博士讲述自主机器人的人工智能与机器学习机器学习算法

专知会员服务

40+阅读 · 2020年9月21日

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

专知会员服务

77+阅读 · 2020年2月8日

【强化学习论文推荐集合】2019年必读的10篇TOP强化学习论文，My Top 10 Deep RL Papers of 2019

【强化学习论文推荐集合】2019年必读的10篇TOP强化学习论文，My Top 10 Deep RL Papers of 2019

专知会员服务

42+阅读 · 2020年1月15日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

【ICIG2021】Latest News & Announcements of the Industry Talk2

【ICIG2021】Latest News & Announcements of the Industry Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年7月29日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

LibRec 精选：推荐系统的论文与源码

LibRec 精选：推荐系统的论文与源码

LibRec智能推荐

14+阅读 · 2018年11月29日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

Nrf2-Keap1通路激活对微波辐射致神经细胞损伤的保护作用的基础研究

国家自然科学基金

0+阅读 · 2013年12月31日

《物理》期刊

国家自然科学基金

4+阅读 · 2013年2月4日

一类新颖结构的链霉菌源Vicenistations类抗肿瘤成分研究

国家自然科学基金

0+阅读 · 2012年12月31日

氧化石墨烯/海藻酸钠复合抗菌敷料的制备与研究

国家自然科学基金

0+阅读 · 2012年12月31日

青少年特发性脊柱侧凸发病机制中非编码RNA的相关研究

国家自然科学基金

0+阅读 · 2012年12月31日

Arisandilactone A 的不对称全合成

国家自然科学基金

0+阅读 · 2012年12月31日

miR-135a调控TRPC1在糖尿病肾病发病中的作用及机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

分子转动振动波包的飞秒激光调控

国家自然科学基金

0+阅读 · 2009年12月31日

新型中红外激光晶体Er3＋:CaReAlO4(Re=Y,Gd)的研究

国家自然科学基金

0+阅读 · 2009年12月31日

果蔬拟除虫菊酯农药残留高效降解酶Cpde分子改造研究

国家自然科学基金

0+阅读 · 2008年12月31日

GowFed -- A novel Federated Network Intrusion Detection System

Arxiv

0+阅读 · 2022年11月2日

Optimal Complexity in Non-Convex Decentralized Learning over Time-Varying Networks

Arxiv

0+阅读 · 2022年11月1日

Natural Language to Code Translation with Execution

Arxiv

0+阅读 · 2022年11月1日

Decentralized adaptive clustering of deep nets is beneficial for client collaboration

Arxiv

0+阅读 · 2022年10月31日

Decentralized Channel Management in WLANs with Graph Neural Networks

Arxiv

0+阅读 · 2022年10月30日

Deciding What to Model: Value-Equivalent Sampling for Reinforcement Learning

Arxiv

0+阅读 · 2022年10月30日

Low-Complexity Channel Estimation for Massive MIMO Systems with Decentralized Baseband Processing

Arxiv

0+阅读 · 2022年10月28日

A Hierarchical Approach to Conditional Random Fields for System Anomaly Detection

A Hierarchical Approach to Conditional Random Fields for System Anomaly Detection

Arxiv

0+阅读 · 2022年10月28日

Decentralized and Communication-Free Multi-Robot Navigation through Distributed Games

Arxiv

40+阅读 · 2021年9月15日

Event Extraction with Generative Adversarial Imitation Learning

Arxiv

13+阅读 · 2018年4月21日

VIP会员

文章信息

相关主题

相互独立的

相关VIP内容

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

125+阅读 · 2022年4月21日

【硬核课】机器人学习课程，UT Austin朱玉可博士讲述自主机器人的人工智能与机器学习机器学习算法

【硬核课】机器人学习课程，UT Austin朱玉可博士讲述自主机器人的人工智能与机器学习机器学习算法

专知会员服务

40+阅读 · 2020年9月21日

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

专知会员服务

77+阅读 · 2020年2月8日

【强化学习论文推荐集合】2019年必读的10篇TOP强化学习论文，My Top 10 Deep RL Papers of 2019

【强化学习论文推荐集合】2019年必读的10篇TOP强化学习论文，My Top 10 Deep RL Papers of 2019

专知会员服务

42+阅读 · 2020年1月15日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《美国海军陆战队软件定义网络应用案例：分布式防火墙自动化系统》148页

《多体环境下定位导航授时（PNT）系统研究》228页

软件定义无线电（SDR）：商业与军事领域的技术、应用及未来趋势

《攻势防空作战中无人追击者/规避者最优轨迹研究（含动态交战区建模）》95页

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

【ICIG2021】Latest News & Announcements of the Industry Talk2

【ICIG2021】Latest News & Announcements of the Industry Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年7月29日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

LibRec 精选：推荐系统的论文与源码

LibRec 精选：推荐系统的论文与源码

LibRec智能推荐

14+阅读 · 2018年11月29日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

GowFed -- A novel Federated Network Intrusion Detection System

Arxiv

0+阅读 · 2022年11月2日

Optimal Complexity in Non-Convex Decentralized Learning over Time-Varying Networks

Arxiv

0+阅读 · 2022年11月1日

Natural Language to Code Translation with Execution

Arxiv

0+阅读 · 2022年11月1日

Decentralized adaptive clustering of deep nets is beneficial for client collaboration

Arxiv

0+阅读 · 2022年10月31日

Decentralized Channel Management in WLANs with Graph Neural Networks

Arxiv

0+阅读 · 2022年10月30日

Deciding What to Model: Value-Equivalent Sampling for Reinforcement Learning

Arxiv

0+阅读 · 2022年10月30日

Low-Complexity Channel Estimation for Massive MIMO Systems with Decentralized Baseband Processing

Arxiv

0+阅读 · 2022年10月28日

A Hierarchical Approach to Conditional Random Fields for System Anomaly Detection

A Hierarchical Approach to Conditional Random Fields for System Anomaly Detection

Arxiv

0+阅读 · 2022年10月28日

Decentralized and Communication-Free Multi-Robot Navigation through Distributed Games

Arxiv

40+阅读 · 2021年9月15日

Event Extraction with Generative Adversarial Imitation Learning

Arxiv

13+阅读 · 2018年4月21日

相关基金

Nrf2-Keap1通路激活对微波辐射致神经细胞损伤的保护作用的基础研究

国家自然科学基金

0+阅读 · 2013年12月31日

《物理》期刊

国家自然科学基金

4+阅读 · 2013年2月4日

一类新颖结构的链霉菌源Vicenistations类抗肿瘤成分研究

国家自然科学基金

0+阅读 · 2012年12月31日

氧化石墨烯/海藻酸钠复合抗菌敷料的制备与研究

国家自然科学基金

0+阅读 · 2012年12月31日

青少年特发性脊柱侧凸发病机制中非编码RNA的相关研究

国家自然科学基金

0+阅读 · 2012年12月31日

Arisandilactone A 的不对称全合成

国家自然科学基金

0+阅读 · 2012年12月31日

miR-135a调控TRPC1在糖尿病肾病发病中的作用及机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

分子转动振动波包的飞秒激光调控

国家自然科学基金

0+阅读 · 2009年12月31日

新型中红外激光晶体Er3＋:CaReAlO4(Re=Y,Gd)的研究

国家自然科学基金

0+阅读 · 2009年12月31日

果蔬拟除虫菊酯农药残留高效降解酶Cpde分子改造研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员