分权的乐观派进步派后裔/Infinite-horizon 竞争性马尔科夫运动会中种族优越派 (Last-iterate Convergence of Decentralized Optimistic Gradient Descent/Ascent in Infinite-horizon Competitive Markov Games) - 专知论文

会员服务 ·

0

Self-Play · 情景 · 学成 · 平稳的 · 评论员 ·

2021 年 7 月 7 日

Last-iterate Convergence of Decentralized Optimistic Gradient Descent/Ascent in Infinite-horizon Competitive Markov Games

翻译：分权的乐观派进步派后裔/Infinite-horizon 竞争性马尔科夫运动会中种族优越派

Chen-Yu Wei,Chung-Wei Lee,Mengxiao Zhang,Haipeng Luo

We study infinite-horizon discounted two-player zero-sum Markov games, and develop a decentralized algorithm that provably converges to the set of Nash equilibria under self-play. Our algorithm is based on running an Optimistic Gradient Descent Ascent algorithm on each state to learn the policies, with a critic that slowly learns the value of each state. To the best of our knowledge, this is the first algorithm in this setting that is simultaneously rational (converging to the opponent's best response when it uses a stationary policy), convergent (converging to the set of Nash equilibria under self-play), agnostic (no need to know the actions played by the opponent), symmetric (players taking symmetric roles in the algorithm), and enjoying a finite-time last-iterate convergence guarantee, all of which are desirable properties of decentralized algorithms.

翻译：我们研究的是无穷的分级算法,并开发了一种分散的算法,这种算法可以与自玩的纳什平衡相融合。我们的算法基于在每个州运行一个优化的梯度梯子梯度算法以学习政策,而批评者则慢慢地学习了每个州的价值。据我们所知,这是这个环境中第一个同时理性的算法(在使用固定政策时与对手的最佳反应相融合 ), 集中(在自玩时与纳什平衡法组合相融合 ), 随机(不需要知道对手的行为 ), 对称(玩家在算法中扮演对称角色 ), 享受有限时间的上世纪趋同保证, 所有这些都是分散算法的可取属性。

0

相关内容

Self-Play

【ICML2021】异质风险最小化，Heterogeneous Risk Minimization

专知会员服务

16+阅读 · 2021年5月21日

【SIGGRAPH 2020】人像阴影处理，Portrait Shadow Manipulation

【SIGGRAPH 2020】人像阴影处理，Portrait Shadow Manipulation

专知会员服务

29+阅读 · 2020年5月19日

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

专知会员服务

131+阅读 · 2020年4月19日

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

专知会员服务

41+阅读 · 2020年4月11日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

强化学习最优表示的几何视角（A Geometric Perspective on Optimal Representations for Reinforcement Learning）

强化学习最优表示的几何视角（A Geometric Perspective on Optimal Representations for Reinforcement Learning）

专知会员服务

9+阅读 · 2019年12月24日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【AAMSA 2019 | tutorial】多智能体系统中的认知推理Epistemic Reasoning In Multiagent Systems ,法国雷恩François Schwarzentruber

【AAMSA 2019 | tutorial】多智能体系统中的认知推理Epistemic Reasoning In Multiagent Systems ,法国雷恩François Schwarzentruber

专知会员服务

24+阅读 · 2019年5月14日

.NET Core + Ocelot + IdentityServer4 + Consul 基础架构实现

.NET Core + Ocelot + IdentityServer4 + Consul 基础架构实现

DotNet

4+阅读 · 2019年5月25日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

OpenAI丨深度强化学习关键论文列表

OpenAI丨深度强化学习关键论文列表

中国人工智能学会

17+阅读 · 2018年11月10日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

Coordinate Descent Methods for DC Minimization

Coordinate Descent Methods for DC Minimization

Arxiv

0+阅读 · 2021年9月9日

The convergence of the Regula Falsi method

Arxiv

0+阅读 · 2021年9月8日

Age of Information in Random Access Channels

Arxiv

0+阅读 · 2021年9月8日

Convergence of Batch Asynchronous Stochastic Approximation With Applications to Reinforcement Learning

Arxiv

0+阅读 · 2021年9月8日

Convergence Analysis of Nonconvex Distributed Stochastic Zeroth-order Coordinate Method

Arxiv

0+阅读 · 2021年9月8日

On the Convergence of Decentralized Adaptive Gradient Methods

Arxiv

0+阅读 · 2021年9月7日

Policy Gradient Bayesian Robust Optimization for Imitation Learning

Arxiv

5+阅读 · 2021年6月11日

Variational Bayesian Reinforcement Learning with Regret Bounds

Arxiv

3+阅读 · 2018年7月25日

Optimal Algorithms for Non-Smooth Distributed Optimization in Networks

Arxiv

7+阅读 · 2018年6月1日

Multiagent Soft Q-Learning

Arxiv

11+阅读 · 2018年4月25日

VIP会员

文章信息

相关主题

相关VIP内容

【ICML2021】异质风险最小化，Heterogeneous Risk Minimization

专知会员服务

16+阅读 · 2021年5月21日

【SIGGRAPH 2020】人像阴影处理，Portrait Shadow Manipulation

【SIGGRAPH 2020】人像阴影处理，Portrait Shadow Manipulation

专知会员服务

29+阅读 · 2020年5月19日

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

专知会员服务

131+阅读 · 2020年4月19日

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

专知会员服务

41+阅读 · 2020年4月11日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

强化学习最优表示的几何视角（A Geometric Perspective on Optimal Representations for Reinforcement Learning）

强化学习最优表示的几何视角（A Geometric Perspective on Optimal Representations for Reinforcement Learning）

专知会员服务

9+阅读 · 2019年12月24日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【AAMSA 2019 | tutorial】多智能体系统中的认知推理Epistemic Reasoning In Multiagent Systems ,法国雷恩François Schwarzentruber

【AAMSA 2019 | tutorial】多智能体系统中的认知推理Epistemic Reasoning In Multiagent Systems ,法国雷恩François Schwarzentruber

专知会员服务

24+阅读 · 2019年5月14日

热门VIP内容

开通专知VIP会员享更多权益服务

《物联网（IoT）中的无人机通信高效控制》135页

《在GNSS信号降级环境中利用共识实现无人机集群稳健协调》

中程单向攻击无人机的战略意义：俄乌战争启示

《面向无人机集群的避障动态传感器覆盖算法》最新38页

相关资讯

.NET Core + Ocelot + IdentityServer4 + Consul 基础架构实现

.NET Core + Ocelot + IdentityServer4 + Consul 基础架构实现

DotNet

4+阅读 · 2019年5月25日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

OpenAI丨深度强化学习关键论文列表

OpenAI丨深度强化学习关键论文列表

中国人工智能学会

17+阅读 · 2018年11月10日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

Coordinate Descent Methods for DC Minimization

Coordinate Descent Methods for DC Minimization

Arxiv

0+阅读 · 2021年9月9日

The convergence of the Regula Falsi method

Arxiv

0+阅读 · 2021年9月8日

Age of Information in Random Access Channels

Arxiv

0+阅读 · 2021年9月8日

Convergence of Batch Asynchronous Stochastic Approximation With Applications to Reinforcement Learning

Arxiv

0+阅读 · 2021年9月8日

Convergence Analysis of Nonconvex Distributed Stochastic Zeroth-order Coordinate Method

Arxiv

0+阅读 · 2021年9月8日

On the Convergence of Decentralized Adaptive Gradient Methods

Arxiv

0+阅读 · 2021年9月7日

Policy Gradient Bayesian Robust Optimization for Imitation Learning

Arxiv

5+阅读 · 2021年6月11日

Variational Bayesian Reinforcement Learning with Regret Bounds

Arxiv

3+阅读 · 2018年7月25日

Optimal Algorithms for Non-Smooth Distributed Optimization in Networks

Arxiv

7+阅读 · 2018年6月1日

Multiagent Soft Q-Learning

Arxiv

11+阅读 · 2018年4月25日

微信扫码咨询专知VIP会员