马尔科夫潜在运动会软体政策梯度的无政府保障的趋同和价格 (Convergence and Price of Anarchy Guarantees of the Softmax Policy Gradient in Markov Potential Games) - 专知论文

会员服务 ·

0

Markov · Softmax · Performer · 广义函数 · Agent ·

2022 年 6 月 15 日

Convergence and Price of Anarchy Guarantees of the Softmax Policy Gradient in Markov Potential Games

翻译：马尔科夫潜在运动会软体政策梯度的无政府保障的趋同和价格

Dingyang Chen,Qi Zhang,Thinh T. Doan

We study the performance of policy gradient methods for the subclass of Markov games known as Markov potential games (MPGs), which extends the notion of normal-form potential games to the stateful setting and includes the important special case of the fully cooperative setting where the agents share an identical reward function. Our focus in this paper is to study the convergence of the policy gradient method for solving MPGs under softmax policy parameterization, both tabular and parameterized with general function approximators such as neural networks. We first show the asymptotic convergence of this method to a Nash equilibrium of MPGs for tabular softmax policies. Second, we derive the finite-time performance of the policy gradient in two settings: 1) using the log-barrier regularization, and 2) using the natural policy gradient under the best-response dynamics (NPG-BR). Finally, extending the notion of price of anarchy (POA) and smoothness in normal-form games, we introduce the POA for MPGs and provide a POA bound for NPG-BR. To our knowledge, this is the first POA bound for solving MPGs. To support our theoretical results, we empirically compare the convergence rates and POA of policy gradient variants for both tabular and neural softmax policies.

翻译：我们对被称为Markov潜在游戏(MPGs)的Markov亚类游戏的政策梯度方法的绩效进行了研究,该方法将正常形式潜在游戏的概念扩展到了状态环境,并包括了完全合作环境的重要特例,代理商在其中拥有相同的奖赏功能。我们本文件的重点是研究政策梯度方法在软式政策参数化下解决MPG的政策梯度方法的趋同性,既采用表格形式,又采用普通功能相近者,如神经网络。我们首先展示了这一方法在表格软式政策中与MPG的纳什平衡的不相称性趋同性。第二,我们用两种环境来得出政策梯度的有限性表现:1)使用对日志屏障的正规化,2)在最佳反应动态(NPG-BR)下使用自然政策梯度。最后,扩展了无政府状态价格概念和正常形式游戏的平滑度,我们为MPGs引入了POA,并为NPG-BR提供了约束性PA。据我们了解,这是第一个对解决MPG政策趋同度的PA政策等级的PA约束。

0

相关内容

Markov

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知会员服务

162+阅读 · 2020年1月16日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

开源书：PyTorch深度学习起步

开源书：PyTorch深度学习起步

专知会员服务

51+阅读 · 2019年10月11日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

开放知识图谱

1+阅读 · 2022年4月4日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

TMS1基因响应高温胁迫和ER Stress的分子机制

国家自然科学基金

0+阅读 · 2014年12月31日

三维椭圆方程Cauchy问题的正则化方法

国家自然科学基金

0+阅读 · 2013年12月31日

具有状态约束的Navier-Stokes方程的最优控制问题

国家自然科学基金

0+阅读 · 2013年12月31日

非凸映射的Robinson-Ursescu定理及度量次正则性

国家自然科学基金

0+阅读 · 2012年12月31日

紫外低吸收GdAl3（BO3）4晶体生长和266nm激光输出研究

国家自然科学基金

0+阅读 · 2012年12月31日

具有控制和状态约束的抛物系统最优控制问题的数值近似方法

国家自然科学基金

0+阅读 · 2012年12月31日

约束Markov过程的大偏差与拟遍历性及相关问题

国家自然科学基金

0+阅读 · 2012年12月31日

能量临界情形的非线性Schrodinger方程

国家自然科学基金

0+阅读 · 2011年12月31日

基本群表示，调和度量的构造及其到上同调的应用

国家自然科学基金

1+阅读 · 2011年12月31日

约化群酉表示的branching law及其应用

国家自然科学基金

0+阅读 · 2009年12月31日

Efficiently Computing Nash Equilibria in Adversarial Team Markov Games

Arxiv

0+阅读 · 2022年8月3日

Learning of Parameters in Behavior Trees for Movement Skills

Arxiv

0+阅读 · 2022年8月2日

Numerical identification of initial temperatures in heat equation with dynamic boundary conditions

Arxiv

0+阅读 · 2022年8月1日

A Maintenance Planning Framework using Online and Offline Deep Reinforcement Learning

Arxiv

0+阅读 · 2022年8月1日

Constrained multi-agent ergodic area surveying control based on finite element approximation of the potential field

Arxiv

0+阅读 · 2022年8月1日

The Search and Rescue Game on a Cycle

Arxiv

0+阅读 · 2022年7月31日

A Bayesian Approach to Learning Bandit Structure in Markov Decision Processes

Arxiv

0+阅读 · 2022年7月30日

Solving the vehicle routing problem with deep reinforcement learning

Arxiv

0+阅读 · 2022年7月30日

lifex: a flexible, high performance library for the numerical solution of complex finite element problems

Arxiv

0+阅读 · 2022年7月29日

The Confluence of Networks, Games and Learning

Arxiv

94+阅读 · 2021年5月17日

VIP会员

文章信息

相关主题

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知会员服务

162+阅读 · 2020年1月16日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

开源书：PyTorch深度学习起步

开源书：PyTorch深度学习起步

专知会员服务

51+阅读 · 2019年10月11日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《美陆军徒步机动作战条令手册》最新168页

【博士论文】基于不确定性的可靠性：现代机器学习中的选择性预测与可信部署

军事后勤数字化未来展望

《美海军后勤体系整合与创新挑战》最新报告

相关资讯

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

开放知识图谱

1+阅读 · 2022年4月4日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

Efficiently Computing Nash Equilibria in Adversarial Team Markov Games

Arxiv

0+阅读 · 2022年8月3日

Learning of Parameters in Behavior Trees for Movement Skills

Arxiv

0+阅读 · 2022年8月2日

Numerical identification of initial temperatures in heat equation with dynamic boundary conditions

Arxiv

0+阅读 · 2022年8月1日

A Maintenance Planning Framework using Online and Offline Deep Reinforcement Learning

Arxiv

0+阅读 · 2022年8月1日

Constrained multi-agent ergodic area surveying control based on finite element approximation of the potential field

Arxiv

0+阅读 · 2022年8月1日

The Search and Rescue Game on a Cycle

Arxiv

0+阅读 · 2022年7月31日

A Bayesian Approach to Learning Bandit Structure in Markov Decision Processes

Arxiv

0+阅读 · 2022年7月30日

Solving the vehicle routing problem with deep reinforcement learning

Arxiv

0+阅读 · 2022年7月30日

lifex: a flexible, high performance library for the numerical solution of complex finite element problems

Arxiv

0+阅读 · 2022年7月29日

The Confluence of Networks, Games and Learning

Arxiv

94+阅读 · 2021年5月17日

相关基金

TMS1基因响应高温胁迫和ER Stress的分子机制

国家自然科学基金

0+阅读 · 2014年12月31日

三维椭圆方程Cauchy问题的正则化方法

国家自然科学基金

0+阅读 · 2013年12月31日

具有状态约束的Navier-Stokes方程的最优控制问题

国家自然科学基金

0+阅读 · 2013年12月31日

非凸映射的Robinson-Ursescu定理及度量次正则性

国家自然科学基金

0+阅读 · 2012年12月31日

紫外低吸收GdAl3（BO3）4晶体生长和266nm激光输出研究

国家自然科学基金

0+阅读 · 2012年12月31日

具有控制和状态约束的抛物系统最优控制问题的数值近似方法

国家自然科学基金

0+阅读 · 2012年12月31日

约束Markov过程的大偏差与拟遍历性及相关问题

国家自然科学基金

0+阅读 · 2012年12月31日

能量临界情形的非线性Schrodinger方程

国家自然科学基金

0+阅读 · 2011年12月31日

基本群表示，调和度量的构造及其到上同调的应用

国家自然科学基金

1+阅读 · 2011年12月31日

约化群酉表示的branching law及其应用

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员