马克$2美元:分散式多机构加强学习,以进行配配配</s> (DM$^2$: Decentralized Multi-Agent Reinforcement Learning for Distribution Matching) - 专知论文

会员服务 ·

0

Agent · 相互独立的 · Learning · 优化器 · 相同 ·

2023 年 3 月 13 日

DM$^2$: Decentralized Multi-Agent Reinforcement Learning for Distribution Matching

翻译：马克$2美元:分散式多机构加强学习,以进行配配配

Caroline Wang,Ishan Durugkar,Elad Liebman,Peter Stone

Current approaches to multi-agent cooperation rely heavily on centralized mechanisms or explicit communication protocols to ensure convergence. This paper studies the problem of distributed multi-agent learning without resorting to centralized components or explicit communication. It examines the use of distribution matching to facilitate the coordination of independent agents. In the proposed scheme, each agent independently minimizes the distribution mismatch to the corresponding component of a target visitation distribution. The theoretical analysis shows that under certain conditions, each agent minimizing its individual distribution mismatch allows the convergence to the joint policy that generated the target distribution. Further, if the target distribution is from a joint policy that optimizes a cooperative task, the optimal policy for a combination of this task reward and the distribution matching reward is the same joint policy. This insight is used to formulate a practical algorithm (DM$^2$), in which each individual agent matches a target distribution derived from concurrently sampled trajectories from a joint expert policy. Experimental validation on the StarCraft domain shows that combining (1) a task reward, and (2) a distribution matching reward for expert demonstrations for the same task, allows agents to outperform a naive distributed baseline. Additional experiments probe the conditions under which expert demonstrations need to be sampled to obtain the learning benefits.

翻译：多代理人合作的现行办法在很大程度上依赖于集中机制或明确的通信协议,以确保趋同。本文件研究分布式多代理人学习的问题,而不采用集中部分或明确通信。它研究分配式多代理人学习的问题,审查分配匹配的使用,以便利独立代理人的协调。在拟议的办法中,每个代理人独立地尽量减少分配与目标访问分布相应部分不匹配的情况。理论分析表明,在某些条件下,每个代理人尽量减少其个别分配不匹配的情况,就能够与产生目标分布的联合政策取得一致。此外,如果目标分配来自一项联合政策,该联合政策优化了合作任务、任务奖励和分配匹配奖励相结合的最佳政策是相同的联合政策。这种深入了解用于制定实用的算法(DM$2美元),其中每个代理人都与从联合专家政策同时采样的轨迹中获得的目标分配相匹配。 StarCraft 域的实验性鉴定表明,(1) 任务奖励,以及(2) 同一任务的专家示范的分发相匹配奖励,使代理人能够超越一个天分配的基线。其他实验用来调查专家示威需要获得学习利益的条件。</s>

0

相关内容

Agent

【开放书】卡耐基梅隆大学Elaine Shi 教授《Foundations of Distributed Consensus and Blockchains（分布式共识和区块链的基础）》150页pdf

【开放书】卡耐基梅隆大学Elaine Shi 教授《Foundations of Distributed Consensus and Blockchains（分布式共识和区块链的基础）》150页pdf

专知会员服务

30+阅读 · 2022年2月22日

【干货书】机器学习设计模式，408页pdf，Machine Learning Design Patterns

【干货书】机器学习设计模式，408页pdf，Machine Learning Design Patterns

专知会员服务

138+阅读 · 2022年2月6日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

244+阅读 · 2019年10月21日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

Allo-HSCT后NEU1介导GPIbα去唾液酸化在持续性血小板减少症发生机制中的作用

国家自然科学基金

0+阅读 · 2014年12月31日

西北典型河谷城市儿童铅暴露特征、来源与健康风险

国家自然科学基金

0+阅读 · 2014年12月31日

空间插值的微分几何方法研究

国家自然科学基金

0+阅读 · 2014年12月31日

核酸适配体aptamer原位募集骨髓间充质干细胞在兔胫骨缺损修复中的研究

国家自然科学基金

0+阅读 · 2013年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

荒漠区盐生草种子萌发和幼苗生理生态特征对环境因子响应机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

Perp在类风湿性关节炎外周Th17细胞存活中的作用研究

国家自然科学基金

0+阅读 · 2012年12月31日

超低介电PMO薄膜的可控制备及构效关系研究

国家自然科学基金

0+阅读 · 2012年12月31日

MicRNA107调控BACE1mRNA基因与阿尔茨海默病内质网应激病理机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

共形曲面的谱簇的渐近分析

国家自然科学基金

0+阅读 · 2011年12月31日

Exploration Policies for On-the-Fly Controller Synthesis: A Reinforcement Learning Approach

Arxiv

0+阅读 · 2023年5月3日

Decentralised Active Perception in Continuous Action Spaces for the Coordinated Escort Problem

Arxiv

0+阅读 · 2023年5月3日

Risk-Sensitive Reinforcement Learning with Exponential Criteria

Risk-Sensitive Reinforcement Learning with Exponential Criteria

Arxiv

0+阅读 · 2023年5月2日

Stein Variational Goal Generation for adaptive Exploration in Multi-Goal Reinforcement Learning

Arxiv

0+阅读 · 2023年5月2日

Distributed Deep Reinforcement Learning: A Survey and A Multi-Player Multi-Agent Learning Toolbox

Arxiv

11+阅读 · 2022年12月1日

CELEST: Federated Learning for Globally Coordinated Threat Detection

Arxiv

17+阅读 · 2022年5月23日

Emergent Bartering Behaviour in Multi-Agent Reinforcement Learning

Emergent Bartering Behaviour in Multi-Agent Reinforcement Learning

Arxiv

19+阅读 · 2022年5月13日

Coding for Distributed Multi-Agent Reinforcement Learning

Arxiv

32+阅读 · 2021年1月7日

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Arxiv

26+阅读 · 2020年2月10日

A Survey on Distributed Machine Learning

Arxiv

45+阅读 · 2019年12月20日

VIP会员

文章信息

相关主题

相互独立的

相关VIP内容

【开放书】卡耐基梅隆大学Elaine Shi 教授《Foundations of Distributed Consensus and Blockchains（分布式共识和区块链的基础）》150页pdf

【开放书】卡耐基梅隆大学Elaine Shi 教授《Foundations of Distributed Consensus and Blockchains（分布式共识和区块链的基础）》150页pdf

专知会员服务

30+阅读 · 2022年2月22日

【干货书】机器学习设计模式，408页pdf，Machine Learning Design Patterns

【干货书】机器学习设计模式，408页pdf，Machine Learning Design Patterns

专知会员服务

138+阅读 · 2022年2月6日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

244+阅读 · 2019年10月21日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【博士论文】低维与高维空间中潜在表征的分析、建模与变换

《生态建模密码破译：建模与编程实践》美陆军最新报告

大模型解决方案白皮书：社交陪伴场景全流程落地指南

面向具身操作的视觉-语言-动作模型综述

相关资讯

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Exploration Policies for On-the-Fly Controller Synthesis: A Reinforcement Learning Approach

Arxiv

0+阅读 · 2023年5月3日

Decentralised Active Perception in Continuous Action Spaces for the Coordinated Escort Problem

Arxiv

0+阅读 · 2023年5月3日

Risk-Sensitive Reinforcement Learning with Exponential Criteria

Risk-Sensitive Reinforcement Learning with Exponential Criteria

Arxiv

0+阅读 · 2023年5月2日

Stein Variational Goal Generation for adaptive Exploration in Multi-Goal Reinforcement Learning

Arxiv

0+阅读 · 2023年5月2日

Distributed Deep Reinforcement Learning: A Survey and A Multi-Player Multi-Agent Learning Toolbox

Arxiv

11+阅读 · 2022年12月1日

CELEST: Federated Learning for Globally Coordinated Threat Detection

Arxiv

17+阅读 · 2022年5月23日

Emergent Bartering Behaviour in Multi-Agent Reinforcement Learning

Emergent Bartering Behaviour in Multi-Agent Reinforcement Learning

Arxiv

19+阅读 · 2022年5月13日

Coding for Distributed Multi-Agent Reinforcement Learning

Arxiv

32+阅读 · 2021年1月7日

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Arxiv

26+阅读 · 2020年2月10日

A Survey on Distributed Machine Learning

Arxiv

45+阅读 · 2019年12月20日

相关基金

Allo-HSCT后NEU1介导GPIbα去唾液酸化在持续性血小板减少症发生机制中的作用

国家自然科学基金

0+阅读 · 2014年12月31日

西北典型河谷城市儿童铅暴露特征、来源与健康风险

国家自然科学基金

0+阅读 · 2014年12月31日

空间插值的微分几何方法研究

国家自然科学基金

0+阅读 · 2014年12月31日

核酸适配体aptamer原位募集骨髓间充质干细胞在兔胫骨缺损修复中的研究

国家自然科学基金

0+阅读 · 2013年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

荒漠区盐生草种子萌发和幼苗生理生态特征对环境因子响应机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

Perp在类风湿性关节炎外周Th17细胞存活中的作用研究

国家自然科学基金

0+阅读 · 2012年12月31日

超低介电PMO薄膜的可控制备及构效关系研究

国家自然科学基金

0+阅读 · 2012年12月31日

MicRNA107调控BACE1mRNA基因与阿尔茨海默病内质网应激病理机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

共形曲面的谱簇的渐近分析

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员