全局最优化的合作多智能体强化学习：基于TAD框架的转化和蒸馏方法 (Towards Global Optimality in Cooperative MARL with the Transformation And Distillation Framework) - 专知论文

会员服务 ·

0

优化器 · 全局优化 · 蒸馏 · Learning · Performer ·

2023 年 3 月 23 日

Towards Global Optimality in Cooperative MARL with the Transformation And Distillation Framework

翻译：全局最优化的合作多智能体强化学习：基于TAD框架的转化和蒸馏方法

Jianing Ye,Chenghao Li,Jianhao Wang,Chongjie Zhang

Decentralized execution is one core demand in cooperative multi-agent reinforcement learning (MARL). Recently, most popular MARL algorithms have adopted decentralized policies to enable decentralized execution and use gradient descent as their optimizer. However, there is hardly any theoretical analysis of these algorithms taking the optimization method into consideration, and we find that various popular MARL algorithms with decentralized policies are suboptimal in toy tasks when gradient descent is chosen as their optimization method. In this paper, we theoretically analyze two common classes of algorithms with decentralized policies -- multi-agent policy gradient methods and value-decomposition methods to prove their suboptimality when gradient descent is used. In addition, we propose the Transformation And Distillation (TAD) framework, which reformulates a multi-agent MDP as a special single-agent MDP with a sequential structure and enables decentralized execution by distilling the learned policy on the derived ``single-agent" MDP. This approach uses a two-stage learning paradigm to address the optimization problem in cooperative MARL, maintaining its performance guarantee. Empirically, we implement TAD-PPO based on PPO, which can theoretically perform optimal policy learning in the finite multi-agent MDPs and shows significant outperformance on a large set of cooperative multi-agent tasks.

翻译：分散执行是合作多智能体强化学习（MARL）中的一项核心需求。最近，大多数流行的MARL算法已经采用分散策略来实现分散执行，使用梯度下降作为它们的优化器。然而，在考虑优化方法时，对这些算法进行理论分析几乎没有，当梯度下降作为它们的优化方法时，我们发现各种流行的具有分散策略的MARL算法在玩具任务中都是次优的。在本文中，我们对具有分散策略的两类常见算法进行了理论分析--多智能体策略梯度方法和价值分解方法，证明了它们在使用梯度下降时的亚优性。此外，我们提出了转化和蒸馏(TAD)框架，将多智能体MDP重构为具有连续结构的特殊单智能体MDP，并通过在推导的“单智能体”MDP上蒸馏学习的策略来实现分散执行。这种方法使用两阶段学习范式来解决合作MARL中的优化问题，保持其性能保证。在实证方面，我们基于PPO实现了TAD-PPO，这在有限的多智能体MDP中可以在理论上执行最优策略学习，并在大量合作多智能体任务中表现出明显的优越性。

0

相关内容

优化器

【CMU博士论文】分布式强化学习自动驾驶，100页pdf

【CMU博士论文】分布式强化学习自动驾驶，100页pdf

专知会员服务

37+阅读 · 2023年4月17日

【阿姆斯特丹博士论文】为强化学习和计算机视觉应用构建深度学习模型，216页pdf

【阿姆斯特丹博士论文】为强化学习和计算机视觉应用构建深度学习模型，216页pdf

专知会员服务

50+阅读 · 2023年3月22日

【MIT博士论文】对抗场景中鲁棒且可扩展的多智能体强化学习，123页pdf

【MIT博士论文】对抗场景中鲁棒且可扩展的多智能体强化学习，123页pdf

专知会员服务

104+阅读 · 2022年9月21日

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

104+阅读 · 2022年2月10日

【2022新书】强化学习工业应用，408页pdf

【2022新书】强化学习工业应用，408页pdf

专知会员服务

231+阅读 · 2022年2月3日

【SIGIR2021教程】基于强化学习的信息检索

专知会员服务

28+阅读 · 2021年7月20日

【KDD2021】基于因果反事实Shapley的MARL信度分配

专知会员服务

19+阅读 · 2021年7月11日

【2020 最新论文】节点邻近的图池化的层次表示学习 Graph Pooling with Node Proximity for Hierarchical Representation Learning

【2020 最新论文】节点邻近的图池化的层次表示学习 Graph Pooling with Node Proximity for Hierarchical Representation Learning

专知会员服务

43+阅读 · 2020年7月19日

【ICCV 2019 Toturial】Global Optimization for Geometric Understanding with Provable Guarantees（具有可证明保证的几何理解的全局优化）

【ICCV 2019 Toturial】Global Optimization for Geometric Understanding with Provable Guarantees（具有可证明保证的几何理解的全局优化）

专知会员服务

18+阅读 · 2019年11月1日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

17种深度强化学习算法用Pytorch实现

17种深度强化学习算法用Pytorch实现

新智元

31+阅读 · 2019年9月16日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【OpenAI】深度强化学习关键论文列表

【OpenAI】深度强化学习关键论文列表

专知

11+阅读 · 2018年11月10日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

基于重要性采样的并行离策略强化学习方法研究

国家自然科学基金

23+阅读 · 2015年12月31日

Poisson流形上的修正Hamilton方法

国家自然科学基金

0+阅读 · 2014年12月31日

稀疏支持向量机的理论、算法及应用研究

国家自然科学基金

0+阅读 · 2013年12月31日

一类非凸分裂可行问题及其应用研究

国家自然科学基金

1+阅读 · 2013年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

一类单位逼近卷积函数的边界渐近问题

国家自然科学基金

0+阅读 · 2013年12月31日

受限制策略下多臂Bandit过程的理论与应用研究

国家自然科学基金

0+阅读 · 2012年12月31日

半弧传递图与半边传递图的研究

国家自然科学基金

0+阅读 · 2012年12月31日

组合矩阵论中两类问题的研究

国家自然科学基金

1+阅读 · 2011年12月31日

委托代理问题的一类优化方法和算法设计研究

国家自然科学基金

0+阅读 · 2009年12月31日

A Unifying Formal Approach to Importance Values in Boolean Functions

Arxiv

0+阅读 · 2023年5月14日

Fisher Information Embedding for Node and Graph Learning

Fisher Information Embedding for Node and Graph Learning

Arxiv

0+阅读 · 2023年5月12日

Inapplicable Actions Learning for Knowledge Transfer in Reinforcement Learning

Arxiv

0+阅读 · 2023年5月11日

Active Learning in the Predict-then-Optimize Framework: A Margin-Based Approach

Arxiv

0+阅读 · 2023年5月11日

Enabling Technologies for Programmable and Software-Defined Networks: Bolstering the Path Towards 6G

Arxiv

0+阅读 · 2023年5月10日

Learning with Differentiable Algorithms

Arxiv

11+阅读 · 2022年9月1日

Adversarial Machine Learning in Image Classification: A Survey Towards the Defender's Perspective

Adversarial Machine Learning in Image Classification: A Survey Towards the Defender's Perspective

Arxiv

17+阅读 · 2020年9月8日

Spectral Clustering with Graph Neural Networks for Graph Pooling

Arxiv

25+阅读 · 2020年6月3日

End-to-End Multi-Task Learning with Attention

Arxiv

19+阅读 · 2018年3月28日

NDDR-CNN: Layer-wise Feature Fusing in Multi-Task CNN by Neural Discriminative Dimensionality Reduction

Arxiv

15+阅读 · 2018年1月25日

VIP会员

文章信息

相关主题

相关VIP内容

【CMU博士论文】分布式强化学习自动驾驶，100页pdf

【CMU博士论文】分布式强化学习自动驾驶，100页pdf

专知会员服务

37+阅读 · 2023年4月17日

【阿姆斯特丹博士论文】为强化学习和计算机视觉应用构建深度学习模型，216页pdf

【阿姆斯特丹博士论文】为强化学习和计算机视觉应用构建深度学习模型，216页pdf

专知会员服务

50+阅读 · 2023年3月22日

【MIT博士论文】对抗场景中鲁棒且可扩展的多智能体强化学习，123页pdf

【MIT博士论文】对抗场景中鲁棒且可扩展的多智能体强化学习，123页pdf

专知会员服务

104+阅读 · 2022年9月21日

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

104+阅读 · 2022年2月10日

【2022新书】强化学习工业应用，408页pdf

【2022新书】强化学习工业应用，408页pdf

专知会员服务

231+阅读 · 2022年2月3日

【SIGIR2021教程】基于强化学习的信息检索

专知会员服务

28+阅读 · 2021年7月20日

【KDD2021】基于因果反事实Shapley的MARL信度分配

专知会员服务

19+阅读 · 2021年7月11日

【2020 最新论文】节点邻近的图池化的层次表示学习 Graph Pooling with Node Proximity for Hierarchical Representation Learning

【2020 最新论文】节点邻近的图池化的层次表示学习 Graph Pooling with Node Proximity for Hierarchical Representation Learning

专知会员服务

43+阅读 · 2020年7月19日

【ICCV 2019 Toturial】Global Optimization for Geometric Understanding with Provable Guarantees（具有可证明保证的几何理解的全局优化）

【ICCV 2019 Toturial】Global Optimization for Geometric Understanding with Provable Guarantees（具有可证明保证的几何理解的全局优化）

专知会员服务

18+阅读 · 2019年11月1日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

《小型无人机系统侦测追踪技术：声学、计算机视觉与深度学习融合方案》最新98页

《"牧羊人网格"拦截策略：实现无人机集群可靠拦截的新范式》

光纤无人机：反无人机系统的重大挑战

《作战建模与仿真实证研究》

相关资讯

17种深度强化学习算法用Pytorch实现

17种深度强化学习算法用Pytorch实现

新智元

31+阅读 · 2019年9月16日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【OpenAI】深度强化学习关键论文列表

【OpenAI】深度强化学习关键论文列表

专知

11+阅读 · 2018年11月10日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

A Unifying Formal Approach to Importance Values in Boolean Functions

Arxiv

0+阅读 · 2023年5月14日

Fisher Information Embedding for Node and Graph Learning

Fisher Information Embedding for Node and Graph Learning

Arxiv

0+阅读 · 2023年5月12日

Inapplicable Actions Learning for Knowledge Transfer in Reinforcement Learning

Arxiv

0+阅读 · 2023年5月11日

Active Learning in the Predict-then-Optimize Framework: A Margin-Based Approach

Arxiv

0+阅读 · 2023年5月11日

Enabling Technologies for Programmable and Software-Defined Networks: Bolstering the Path Towards 6G

Arxiv

0+阅读 · 2023年5月10日

Learning with Differentiable Algorithms

Arxiv

11+阅读 · 2022年9月1日

Adversarial Machine Learning in Image Classification: A Survey Towards the Defender's Perspective

Adversarial Machine Learning in Image Classification: A Survey Towards the Defender's Perspective

Arxiv

17+阅读 · 2020年9月8日

Spectral Clustering with Graph Neural Networks for Graph Pooling

Arxiv

25+阅读 · 2020年6月3日

End-to-End Multi-Task Learning with Attention

Arxiv

19+阅读 · 2018年3月28日

NDDR-CNN: Layer-wise Feature Fusing in Multi-Task CNN by Neural Discriminative Dimensionality Reduction

Arxiv

15+阅读 · 2018年1月25日

相关基金

基于重要性采样的并行离策略强化学习方法研究

国家自然科学基金

23+阅读 · 2015年12月31日

Poisson流形上的修正Hamilton方法

国家自然科学基金

0+阅读 · 2014年12月31日

稀疏支持向量机的理论、算法及应用研究

国家自然科学基金

0+阅读 · 2013年12月31日

一类非凸分裂可行问题及其应用研究

国家自然科学基金

1+阅读 · 2013年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

一类单位逼近卷积函数的边界渐近问题

国家自然科学基金

0+阅读 · 2013年12月31日

受限制策略下多臂Bandit过程的理论与应用研究

国家自然科学基金

0+阅读 · 2012年12月31日

半弧传递图与半边传递图的研究

国家自然科学基金

0+阅读 · 2012年12月31日

组合矩阵论中两类问题的研究

国家自然科学基金

1+阅读 · 2011年12月31日

委托代理问题的一类优化方法和算法设计研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员