马尔科夫潜在运动会中多重利益政策逐步升级的全球趋同 (Global Convergence of Multi-Agent Policy Gradient in Markov Potential Games) - 专知论文

会员服务 ·

0

情景 · 势函数 · 相互独立的 · 规范化的 · FAST ·

2021 年 6 月 3 日

Global Convergence of Multi-Agent Policy Gradient in Markov Potential Games

翻译：马尔科夫潜在运动会中多重利益政策逐步升级的全球趋同

Stefanos Leonardos,Will Overman,Ioannis Panageas,Georgios Piliouras

Potential games are arguably one of the most important and widely studied classes of normal form games. They define the archetypal setting of multi-agent coordination as all agent utilities are perfectly aligned with each other via a common potential function. Can this intuitive framework be transplanted in the setting of Markov Games? What are the similarities and differences between multi-agent coordination with and without state dependence? We present a novel definition of Markov Potential Games (MPG) that generalizes prior attempts at capturing complex stateful multi-agent coordination. Counter-intuitively, insights from normal-form potential games do not carry over as MPGs can consist of settings where state-games can be zero-sum games. In the opposite direction, Markov games where every state-game is a potential game are not necessarily MPGs. Nevertheless, MPGs showcase standard desirable properties such as the existence of deterministic Nash policies. In our main technical result, we prove fast convergence of independent policy gradient to Nash policies by adapting recent gradient dominance property arguments developed for single agent MDPs to multi-agent learning settings.

翻译：潜在游戏可以说是常见形式游戏中最重要的和广泛研究的类别之一。它们定义了多试剂协调的原型设置, 因为所有代理工具都通过共同的潜在功能完全对齐。这个直观框架能否在马尔科夫运动会的设置中移植? 多试剂协调与国家依赖性之间有什么相似和不同之处? 我们对马科夫潜在运动会(MPG)提出了一个新颖的定义, 概括了先前试图捕捉复杂状态多剂协调的复杂尝试。反直觉地说, 普通形式潜在游戏的洞察不会传过来, 因为 MPG可以包含状态游戏可以是零和游戏的设置。相反, 每个州游戏都具有潜在游戏的Markov 游戏不一定是MPG 。然而, MPG 展示了标准的可取性属性, 比如存在确定性纳什政策。在我们的主要技术结果中, 我们证明独立政策梯度与纳什政策的快速趋同, 其方法是将最近为单一代理MDP公司开发的梯度定位属性参数调整为多剂学习环境。

0

相关内容

【经典书】主动学习理论，226页pdf，Theory of Active Learning

【经典书】主动学习理论，226页pdf，Theory of Active Learning

专知会员服务

127+阅读 · 2021年7月14日

强化学习组合优化综述论文

专知会员服务

62+阅读 · 2021年6月1日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

策略梯度方法的算子视图，An operator view of policy gradient methods

策略梯度方法的算子视图，An operator view of policy gradient methods

专知会员服务

11+阅读 · 2020年6月23日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

253+阅读 · 2020年4月19日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【CoRL2019最佳论文】模仿学习，A Divergence Minimization Perspective on Imitation Learning Methods

【CoRL2019最佳论文】模仿学习，A Divergence Minimization Perspective on Imitation Learning Methods

专知会员服务

24+阅读 · 2019年11月11日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

动物脑的好奇心和强化学习的好奇心

动物脑的好奇心和强化学习的好奇心

CreateAMind

10+阅读 · 2019年1月26日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

Continuous-Time Convergence Rates in Potential and Monotone Games

Arxiv

0+阅读 · 2021年7月28日

Policy Gradient Methods Find the Nash Equilibrium in N-player General-sum Linear-quadratic Games

Arxiv

0+阅读 · 2021年7月27日

Small-loss bounds for online learning with partial information

Arxiv

0+阅读 · 2021年7月26日

Decentralized Multi-Agent Reinforcement Learning for Task Offloading Under Uncertainty

Decentralized Multi-Agent Reinforcement Learning for Task Offloading Under Uncertainty

Arxiv

0+阅读 · 2021年7月26日

Revisiting Analog Over-the-Air Machine Learning: The Blessing and Curse of Interference

Arxiv

0+阅读 · 2021年7月25日

Distributed Asynchronous Policy Iteration for Sequential Zero-Sum Games and Minimax Control

Arxiv

0+阅读 · 2021年7月24日

Exploration-Exploitation in Multi-Agent Learning: Catastrophe Theory Meets Game Theory

Exploration-Exploitation in Multi-Agent Learning: Catastrophe Theory Meets Game Theory

Arxiv

15+阅读 · 2020年12月15日

Variational Bayesian Reinforcement Learning with Regret Bounds

Arxiv

3+阅读 · 2018年7月25日

Mean Field Multi-Agent Reinforcement Learning

Arxiv

5+阅读 · 2018年6月12日

Reinforcement Learning for Solving the Vehicle Routing Problem

Arxiv

3+阅读 · 2018年5月21日

VIP会员

文章信息

相关主题

相互独立的

相关VIP内容

【经典书】主动学习理论，226页pdf，Theory of Active Learning

【经典书】主动学习理论，226页pdf，Theory of Active Learning

专知会员服务

127+阅读 · 2021年7月14日

强化学习组合优化综述论文

专知会员服务

62+阅读 · 2021年6月1日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

策略梯度方法的算子视图，An operator view of policy gradient methods

策略梯度方法的算子视图，An operator view of policy gradient methods

专知会员服务

11+阅读 · 2020年6月23日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

253+阅读 · 2020年4月19日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【CoRL2019最佳论文】模仿学习，A Divergence Minimization Perspective on Imitation Learning Methods

【CoRL2019最佳论文】模仿学习，A Divergence Minimization Perspective on Imitation Learning Methods

专知会员服务

24+阅读 · 2019年11月11日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

自动驾驶轨迹规划中的基础模型：进展综述与开放挑战

《用于提升多域战备的大型语言模型辅助场景生成器》报告

【斯坦福博士论文】为人类使用优化 AI 模型

国防领域人工智能规模化应用的理论与实践

相关资讯

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

动物脑的好奇心和强化学习的好奇心

动物脑的好奇心和强化学习的好奇心

CreateAMind

10+阅读 · 2019年1月26日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

Continuous-Time Convergence Rates in Potential and Monotone Games

Arxiv

0+阅读 · 2021年7月28日

Policy Gradient Methods Find the Nash Equilibrium in N-player General-sum Linear-quadratic Games

Arxiv

0+阅读 · 2021年7月27日

Small-loss bounds for online learning with partial information

Arxiv

0+阅读 · 2021年7月26日

Decentralized Multi-Agent Reinforcement Learning for Task Offloading Under Uncertainty

Decentralized Multi-Agent Reinforcement Learning for Task Offloading Under Uncertainty

Arxiv

0+阅读 · 2021年7月26日

Revisiting Analog Over-the-Air Machine Learning: The Blessing and Curse of Interference

Arxiv

0+阅读 · 2021年7月25日

Distributed Asynchronous Policy Iteration for Sequential Zero-Sum Games and Minimax Control

Arxiv

0+阅读 · 2021年7月24日

Exploration-Exploitation in Multi-Agent Learning: Catastrophe Theory Meets Game Theory

Exploration-Exploitation in Multi-Agent Learning: Catastrophe Theory Meets Game Theory

Arxiv

15+阅读 · 2020年12月15日

Variational Bayesian Reinforcement Learning with Regret Bounds

Arxiv

3+阅读 · 2018年7月25日

Mean Field Multi-Agent Reinforcement Learning

Arxiv

5+阅读 · 2018年6月12日

Reinforcement Learning for Solving the Vehicle Routing Problem

Arxiv

3+阅读 · 2018年5月21日

微信扫码咨询专知VIP会员