MA-Dreamer:通过共同的想象力进行协调和交流 (MA-Dreamer: Coordination and communication through shared imagination) - 专知论文

会员服务 ·

0

评论员 · 回合 · off-policy · Performer · 估计/估计量 ·

2022 年 4 月 10 日

MA-Dreamer: Coordination and communication through shared imagination

翻译：MA-Dreamer:通过共同的想象力进行协调和交流

Kenzo Lobos-Tsunekawa,Akshay Srinivasan,Michael Spranger

Multi-agent RL is rendered difficult due to the non-stationary nature of environment perceived by individual agents. Theoretically sound methods using the REINFORCE estimator are impeded by its high-variance, whereas value-function based methods are affected by issues stemming from their ad-hoc handling of situations like inter-agent communication. Methods like MADDPG are further constrained due to their requirement of centralized critics etc. In order to address these issues, we present MA-Dreamer, a model-based method that uses both agent-centric and global differentiable models of the environment in order to train decentralized agents' policies and critics using model-rollouts a.k.a `imagination'. Since only the model-training is done off-policy, inter-agent communication/coordination and `language emergence' can be handled in a straight-forward manner. We compare the performance of MA-Dreamer with other methods on two soccer-based games. Our experiments show that in long-term speaker-listener tasks and in cooperative games with strong partial-observability, MA-Dreamer finds a solution that makes effective use of coordination, whereas competing methods obtain marginal scores and fail outright, respectively. By effectively achieving coordination and communication under more relaxed and general conditions, out method opens the door to the study of more complex problems and population-based training.

翻译：使用REINFORCE估计器的理论正确方法受到其高变迁的阻碍,而基于价值功能的方法则受到其临时处理机构间通信等情况所产生的问题的影响; 象MADDPG这样的方法由于需要集中的批评者等而进一步受到限制。为了解决这些问题,我们介绍了MA-Dreamer, 这是一种基于模型的方法,它既使用代理中心又使用全球差异的环境模型,以便用模型滚动来培训分散的代理人政策和批评者。由于只有示范培训是非政策性、机构间通信/协调和“语文出现”的,才能直接处理示范培训。为了解决这些问题,我们将MA-DDROMER的性能与两种足球游戏的其他方法进行比较。我们的实验表明,在长期的演讲者任务中,以及在合作游戏中,以强烈的局部可耐性为目的,MAD-Dreer找到了一种“想象力 ” ; 由于只有非政策性、机构间通信/协调和“语言出现”的示范性问题才能直接处理。我们比较了MA-Dreamer在两次足球比赛上的表现和其他方法。

0

相关内容

评论员

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

80+阅读 · 2020年7月26日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【医学图像处理中的因果性】52页ppt，Causality Matters in Medical Imaging

【医学图像处理中的因果性】52页ppt，Causality Matters in Medical Imaging

专知会员服务

60+阅读 · 2020年3月14日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

中国图象图形学学会CSIG

1+阅读 · 2021年11月11日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

杜氏盐藻PKC在盐胁迫信号传导中的功能研究

国家自然科学基金

0+阅读 · 2014年12月31日

Foxg1对皮质中间神经元发育的调控机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于人群密度的虚拟人群行为建模及仿真技术研究

国家自然科学基金

2+阅读 · 2013年12月31日

变系数微分方程的谱方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

多巴胺D1受体的SUMO-1修饰对PP2A的调控及受体功能的作用机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

软件人和机器人平行进化与协同智能仿真研究

国家自然科学基金

0+阅读 · 2011年12月31日

AR/let-7及其下游分子对ER-AR+乳腺癌干细胞生长的调控机制

国家自然科学基金

0+阅读 · 2011年12月31日

复杂环境下智能轮椅的感知与控制

国家自然科学基金

3+阅读 · 2011年12月31日

Dyrk1A调控CaMKⅡ#948;的可变剪接及其在心脏重构过程中的作用

国家自然科学基金

0+阅读 · 2009年12月31日

实时反应式系统中基于ECA规则的推理方法研究

国家自然科学基金

1+阅读 · 2008年12月31日

A Brief Guide to Designing and Evaluating Human-Centered Interactive Machine Learning

Arxiv

0+阅读 · 2022年4月20日

Mingling Foresight with Imagination: Model-Based Cooperative Multi-Agent Reinforcement Learning

Arxiv

1+阅读 · 2022年4月20日

Decentralized Control of Distributed Cloud Networks with Generalized Network Flows

Arxiv

0+阅读 · 2022年4月19日

Automated Application Processing

Arxiv

1+阅读 · 2022年4月19日

Auto-Icon+: An Automated End-to-End Code Generation Tool for Icon Designs in UI Development

Arxiv

0+阅读 · 2022年4月19日

On Safety Testing, Validation, and Characterization with Scenario-Sampling: A Case Study of Legged Robots

Arxiv

1+阅读 · 2022年4月16日

Nanorobot queue: Cooperative treatment of cancer based on team member communication and image processing

Nanorobot queue: Cooperative treatment of cancer based on team member communication and image processing

Arxiv

0+阅读 · 2022年4月15日

A Flexible Proof Format for SAT Solver-Elaborator Communication

Arxiv

0+阅读 · 2022年4月15日

Interacting with Non-Cooperative User: A New Paradigm for Proactive Dialogue Policy

Arxiv

0+阅读 · 2022年4月7日

Decentralized and Communication-Free Multi-Robot Navigation through Distributed Games

Arxiv

40+阅读 · 2021年9月15日

VIP会员

文章信息

相关主题

估计/估计量

相关VIP内容

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

80+阅读 · 2020年7月26日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【医学图像处理中的因果性】52页ppt，Causality Matters in Medical Imaging

【医学图像处理中的因果性】52页ppt，Causality Matters in Medical Imaging

专知会员服务

60+阅读 · 2020年3月14日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

热门VIP内容

开通专知VIP会员享更多权益服务

美军AI人物介绍 | 2025年美国政府和军方五大人工智能领导者

《战略决策流程：危机管理指南》最新36页报告

持续强化学习研究综述

中文版2500字 | 人工智能如何塑造伊朗-以色列12日战争

相关资讯

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

中国图象图形学学会CSIG

1+阅读 · 2021年11月11日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

A Brief Guide to Designing and Evaluating Human-Centered Interactive Machine Learning

Arxiv

0+阅读 · 2022年4月20日

Mingling Foresight with Imagination: Model-Based Cooperative Multi-Agent Reinforcement Learning

Arxiv

1+阅读 · 2022年4月20日

Decentralized Control of Distributed Cloud Networks with Generalized Network Flows

Arxiv

0+阅读 · 2022年4月19日

Automated Application Processing

Arxiv

1+阅读 · 2022年4月19日

Auto-Icon+: An Automated End-to-End Code Generation Tool for Icon Designs in UI Development

Arxiv

0+阅读 · 2022年4月19日

On Safety Testing, Validation, and Characterization with Scenario-Sampling: A Case Study of Legged Robots

Arxiv

1+阅读 · 2022年4月16日

Nanorobot queue: Cooperative treatment of cancer based on team member communication and image processing

Nanorobot queue: Cooperative treatment of cancer based on team member communication and image processing

Arxiv

0+阅读 · 2022年4月15日

A Flexible Proof Format for SAT Solver-Elaborator Communication

Arxiv

0+阅读 · 2022年4月15日

Interacting with Non-Cooperative User: A New Paradigm for Proactive Dialogue Policy

Arxiv

0+阅读 · 2022年4月7日

Decentralized and Communication-Free Multi-Robot Navigation through Distributed Games

Arxiv

40+阅读 · 2021年9月15日

相关基金

杜氏盐藻PKC在盐胁迫信号传导中的功能研究

国家自然科学基金

0+阅读 · 2014年12月31日

Foxg1对皮质中间神经元发育的调控机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于人群密度的虚拟人群行为建模及仿真技术研究

国家自然科学基金

2+阅读 · 2013年12月31日

变系数微分方程的谱方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

多巴胺D1受体的SUMO-1修饰对PP2A的调控及受体功能的作用机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

软件人和机器人平行进化与协同智能仿真研究

国家自然科学基金

0+阅读 · 2011年12月31日

AR/let-7及其下游分子对ER-AR+乳腺癌干细胞生长的调控机制

国家自然科学基金

0+阅读 · 2011年12月31日

复杂环境下智能轮椅的感知与控制

国家自然科学基金

3+阅读 · 2011年12月31日

Dyrk1A调控CaMKⅡ#948;的可变剪接及其在心脏重构过程中的作用

国家自然科学基金

0+阅读 · 2009年12月31日

实时反应式系统中基于ECA规则的推理方法研究

国家自然科学基金

1+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员