Markov潜在运动会中的独立和分散学习 (Independent and Decentralized Learning in Markov Potential Games) - 专知论文

会员服务 ·

0

相互独立的 · 估计/估计量 · 学成 · 势函数 · 知识 (knowledge) ·

2022 年 5 月 31 日

Independent and Decentralized Learning in Markov Potential Games

翻译：Markov潜在运动会中的独立和分散学习

Chinmay Maheshwari,Manxi Wu,Druv Pai,Shankar Sastry

from arxiv, 42 pages, 3 figures

We propose a multi-agent reinforcement learning dynamics, and analyze its convergence properties in infinite-horizon discounted Markov potential games. We focus on the independent and decentralized setting, where players can only observe the realized state and their own reward in every stage. Players do not have knowledge of the game model, and cannot coordinate with each other. In each stage of our learning dynamics, players update their estimate of a perturbed Q-function that evaluates their total contingent payoff based on the realized one-stage reward in an asynchronous manner. Then, players independently update their policies by incorporating a smoothed optimal one-stage deviation strategy based on the estimated Q-function. A key feature of the learning dynamics is that the Q-function estimates are updated at a faster timescale than the policies. We prove that the policies induced by our learning dynamics converge to a stationary Nash equilibrium in Markov potential games with probability 1. Our results build on the theory of two timescale asynchronous stochastic approximation, and new analysis on the monotonicity of potential function along the trajectory of policy updates in Markov potential games.

翻译：我们建议多试剂强化学习动态, 并分析其在无限象素折扣的Markov潜在游戏中的趋同特性。我们关注独立和分散的场景, 玩家只能在每个阶段观察已经实现的状态和他们自己的奖赏。玩家对游戏模式并不了解, 并且无法相互协调。在学习动态的每个阶段, 玩家更新了他们对于受干扰的Q- 功能的估计, 根据已经实现的一阶段奖赏, 以不同步的方式评估他们的总或有报酬。然后, 玩家独立更新他们的政策, 根据估计的Q- 功能, 引入一个平稳的最佳一阶段偏差战略。学习动态的一个关键特征是, Q- 功能估计的更新速度比政策要快。我们证明, 我们学习动态所引发的政策与马尔科夫潜在游戏的定点纳什平衡相交汇合, 概率 1. 我们的结果建立在两个时间尺度的不同步的随机近近理论基础上, 并且根据Markov 潜在游戏中政策更新的轨迹对潜在功能的单一性进行新的分析。

0

相关内容

相互独立的

相互独立的

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

专知会员服务

77+阅读 · 2020年2月8日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

【ICIG2021】Latest News & Announcements of the Industry Talk2

【ICIG2021】Latest News & Announcements of the Industry Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年7月29日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

干扰素诱导基因ASB13拮抗流感病毒复制机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

瘦素调节2型糖尿病大鼠交感神经活性及压力反射敏感性的机制

国家自然科学基金

0+阅读 · 2015年12月31日

基于Catellani反应合成轴手性和平面手性化合物的研究

国家自然科学基金

0+阅读 · 2014年12月31日

预测miRNA在原发性高血压合并2型糖尿病的治疗靶点研究

国家自然科学基金

0+阅读 · 2014年12月31日

肿瘤抗原HCA587与STAT3的相互作用及其促进肿瘤转移的分子机制研究

国家自然科学基金

1+阅读 · 2014年12月31日

Catestatin蛋白肽段抑制动脉粥样硬化的作用及机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

一类时滞非线性系统随机动力学与控制

国家自然科学基金

0+阅读 · 2012年12月31日

几类倒向双重随机微分方程及其应用研究

国家自然科学基金

0+阅读 · 2012年12月31日

miR-146与miR-200 在高脂诱导的大鼠肝脏胰岛素抵抗形成中的作用与机制

国家自然科学基金

0+阅读 · 2011年12月31日

分数布朗运动环境下金融保险中优化问题的研究

国家自然科学基金

0+阅读 · 2009年12月31日

Fast Composite Optimization and Statistical Recovery in Federated Learning

Arxiv

0+阅读 · 2022年7月17日

Policy Optimization for Markov Games: Unified Framework and Faster Convergence

Arxiv

0+阅读 · 2022年7月15日

Convergence of Batch Asynchronous Stochastic Approximation With Applications to Reinforcement Learning

Arxiv

0+阅读 · 2022年7月15日

Fully Decentralized Model-based Policy Optimization for Networked Systems

Arxiv

0+阅读 · 2022年7月13日

A Coupling Approach to Analyzing Games with Dynamic Environments

Arxiv

0+阅读 · 2022年7月13日

COOR-PLT: A hierarchical control model for coordinating adaptive platoons of connected and autonomous vehicles at signal-free intersections based on deep reinforcement learning

Arxiv

0+阅读 · 2022年7月1日

Recent Advances in Reinforcement Learning in Finance

Arxiv

11+阅读 · 2021年12月8日

Decentralized and Communication-Free Multi-Robot Navigation through Distributed Games

Arxiv

40+阅读 · 2021年9月15日

The Confluence of Networks, Games and Learning

Arxiv

94+阅读 · 2021年5月17日

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Arxiv

26+阅读 · 2020年2月10日

VIP会员

文章信息

相关主题

相互独立的

估计/估计量

知识 (knowledge)

相关VIP内容

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

专知会员服务

77+阅读 · 2020年2月8日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《美空军条令出版物：战略打击》最新条令

《高能激光武器》22页slides

军事前沿模型

《面向小型无人机或无人飞行器的创新雷达探测与人工智能分类技术》263页

相关资讯

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

【ICIG2021】Latest News & Announcements of the Industry Talk2

【ICIG2021】Latest News & Announcements of the Industry Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年7月29日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Fast Composite Optimization and Statistical Recovery in Federated Learning

Arxiv

0+阅读 · 2022年7月17日

Policy Optimization for Markov Games: Unified Framework and Faster Convergence

Arxiv

0+阅读 · 2022年7月15日

Convergence of Batch Asynchronous Stochastic Approximation With Applications to Reinforcement Learning

Arxiv

0+阅读 · 2022年7月15日

Fully Decentralized Model-based Policy Optimization for Networked Systems

Arxiv

0+阅读 · 2022年7月13日

A Coupling Approach to Analyzing Games with Dynamic Environments

Arxiv

0+阅读 · 2022年7月13日

COOR-PLT: A hierarchical control model for coordinating adaptive platoons of connected and autonomous vehicles at signal-free intersections based on deep reinforcement learning

Arxiv

0+阅读 · 2022年7月1日

Recent Advances in Reinforcement Learning in Finance

Arxiv

11+阅读 · 2021年12月8日

Decentralized and Communication-Free Multi-Robot Navigation through Distributed Games

Arxiv

40+阅读 · 2021年9月15日

The Confluence of Networks, Games and Learning

Arxiv

94+阅读 · 2021年5月17日

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Arxiv

26+阅读 · 2020年2月10日

相关基金

干扰素诱导基因ASB13拮抗流感病毒复制机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

瘦素调节2型糖尿病大鼠交感神经活性及压力反射敏感性的机制

国家自然科学基金

0+阅读 · 2015年12月31日

基于Catellani反应合成轴手性和平面手性化合物的研究

国家自然科学基金

0+阅读 · 2014年12月31日

预测miRNA在原发性高血压合并2型糖尿病的治疗靶点研究

国家自然科学基金

0+阅读 · 2014年12月31日

肿瘤抗原HCA587与STAT3的相互作用及其促进肿瘤转移的分子机制研究

国家自然科学基金

1+阅读 · 2014年12月31日

Catestatin蛋白肽段抑制动脉粥样硬化的作用及机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

一类时滞非线性系统随机动力学与控制

国家自然科学基金

0+阅读 · 2012年12月31日

几类倒向双重随机微分方程及其应用研究

国家自然科学基金

0+阅读 · 2012年12月31日

miR-146与miR-200 在高脂诱导的大鼠肝脏胰岛素抵抗形成中的作用与机制

国家自然科学基金

0+阅读 · 2011年12月31日

分数布朗运动环境下金融保险中优化问题的研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员