N-球员普通和线性赤道运动会中的 Nash 平衡 (Policy Gradient Methods Find the Nash Equilibrium in N-player General-sum Linear-quadratic Games) - 专知论文

会员服务 ·

0

纳什均衡 · 噪声 · MoDELS · 情景 · 博弈论 ·

2021 年 7 月 27 日

Policy Gradient Methods Find the Nash Equilibrium in N-player General-sum Linear-quadratic Games

翻译：N-球员普通和线性赤道运动会中的 Nash 平衡

Ben Hambly,Renyuan Xu,Huining Yang

We consider a general-sum N-player linear-quadratic game with stochastic dynamics over a finite horizon and prove the global convergence of the natural policy gradient method to the Nash equilibrium. In order to prove the convergence of the method, we require a certain amount of noise in the system. We give a condition, essentially a lower bound on the covariance of the noise in terms of the model parameters, in order to guarantee convergence. We illustrate our results with numerical experiments to show that even in situations where the policy gradient method may not converge in the deterministic setting, the addition of noise leads to convergence.

翻译：我们认为N球员线性赤道游戏是一个总和,在一定的地平线上具有随机动态,并证明自然政策梯度方法与纳什平衡的全球趋同。为了证明该方法的趋同,我们需要在系统中有一定的噪音。我们给出了一个条件,基本上在模型参数方面对噪音的共变程度有一个较低的约束,以保证趋同。我们用数字实验来说明我们的结果,以表明即使在政策梯度方法可能无法在确定性环境下趋同的情况下,增加噪音也会导致趋同。

0

相关内容

纳什均衡

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

54+阅读 · 2020年9月7日

策略梯度方法的算子视图，An operator view of policy gradient methods

策略梯度方法的算子视图，An operator view of policy gradient methods

专知会员服务

11+阅读 · 2020年6月23日

(普林斯顿讲义)：高维概率论，326页pdf《Probability in High Dimension》

(普林斯顿讲义)：高维概率论，326页pdf《Probability in High Dimension》

专知会员服务

124+阅读 · 2020年5月30日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

112+阅读 · 2020年5月15日

【牛津大学ICLR2020】通过元学习的贝叶斯自适应深度RL, VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning

【牛津大学ICLR2020】通过元学习的贝叶斯自适应深度RL, VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning

专知会员服务

25+阅读 · 2020年2月28日

【CoRL2019最佳论文】模仿学习，A Divergence Minimization Perspective on Imitation Learning Methods

【CoRL2019最佳论文】模仿学习，A Divergence Minimization Perspective on Imitation Learning Methods

专知会员服务

24+阅读 · 2019年11月11日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【IJCAI 2019】自适应影响最大化（Adaptive Influence Maximization），Bogdan Cautis，Silviu Maniu，Nikolaos Tziortziotis

【IJCAI 2019】自适应影响最大化（Adaptive Influence Maximization），Bogdan Cautis，Silviu Maniu，Nikolaos Tziortziotis

专知会员服务

7+阅读 · 2019年8月10日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

动物脑的好奇心和强化学习的好奇心

动物脑的好奇心和强化学习的好奇心

CreateAMind

10+阅读 · 2019年1月26日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

已删除

将门创投

3+阅读 · 2017年9月12日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

Proximal Gradient Method with Extrapolation and Line Search for a Class of Nonconvex and Nonsmooth Problems

Arxiv

0+阅读 · 2021年9月28日

A Variational Inequality Approach to Bayesian Regression Games

Arxiv

0+阅读 · 2021年9月27日

Sub-linear convergence of a stochastic proximal iteration method in Hilbert space

Arxiv

0+阅读 · 2021年9月27日

Adaptive Sampling Quasi-Newton Methods for Zeroth-Order Stochastic Optimization

Arxiv

0+阅读 · 2021年9月24日

Optimal policy evaluation using kernel-based temporal difference methods

Arxiv

0+阅读 · 2021年9月24日

Density-based Curriculum for Multi-goal Reinforcement Learning with Sparse Rewards

Arxiv

0+阅读 · 2021年9月24日

Dimension-Free Rates for Natural Policy Gradient in Multi-Agent Reinforcement Learning

Arxiv

0+阅读 · 2021年9月23日

On the Generalization of Stochastic Gradient Descent with Momentum

Arxiv

0+阅读 · 2021年9月23日

Pipeline PSRO: A Scalable Approach for Finding Approximate Nash Equilibria in Large Games

Arxiv

3+阅读 · 2020年6月15日

SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient

Arxiv

5+阅读 · 2017年8月25日

VIP会员

文章信息

相关主题

相关VIP内容

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

54+阅读 · 2020年9月7日

策略梯度方法的算子视图，An operator view of policy gradient methods

策略梯度方法的算子视图，An operator view of policy gradient methods

专知会员服务

11+阅读 · 2020年6月23日

(普林斯顿讲义)：高维概率论，326页pdf《Probability in High Dimension》

(普林斯顿讲义)：高维概率论，326页pdf《Probability in High Dimension》

专知会员服务

124+阅读 · 2020年5月30日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

112+阅读 · 2020年5月15日

【牛津大学ICLR2020】通过元学习的贝叶斯自适应深度RL, VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning

【牛津大学ICLR2020】通过元学习的贝叶斯自适应深度RL, VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning

专知会员服务

25+阅读 · 2020年2月28日

【CoRL2019最佳论文】模仿学习，A Divergence Minimization Perspective on Imitation Learning Methods

【CoRL2019最佳论文】模仿学习，A Divergence Minimization Perspective on Imitation Learning Methods

专知会员服务

24+阅读 · 2019年11月11日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【IJCAI 2019】自适应影响最大化（Adaptive Influence Maximization），Bogdan Cautis，Silviu Maniu，Nikolaos Tziortziotis

【IJCAI 2019】自适应影响最大化（Adaptive Influence Maximization），Bogdan Cautis，Silviu Maniu，Nikolaos Tziortziotis

专知会员服务

7+阅读 · 2019年8月10日

热门VIP内容

开通专知VIP会员享更多权益服务

自动驾驶轨迹规划中的基础模型：进展综述与开放挑战

《用于提升多域战备的大型语言模型辅助场景生成器》报告

【斯坦福博士论文】为人类使用优化 AI 模型

国防领域人工智能规模化应用的理论与实践

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

动物脑的好奇心和强化学习的好奇心

动物脑的好奇心和强化学习的好奇心

CreateAMind

10+阅读 · 2019年1月26日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

已删除

将门创投

3+阅读 · 2017年9月12日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

Proximal Gradient Method with Extrapolation and Line Search for a Class of Nonconvex and Nonsmooth Problems

Arxiv

0+阅读 · 2021年9月28日

A Variational Inequality Approach to Bayesian Regression Games

Arxiv

0+阅读 · 2021年9月27日

Sub-linear convergence of a stochastic proximal iteration method in Hilbert space

Arxiv

0+阅读 · 2021年9月27日

Adaptive Sampling Quasi-Newton Methods for Zeroth-Order Stochastic Optimization

Arxiv

0+阅读 · 2021年9月24日

Optimal policy evaluation using kernel-based temporal difference methods

Arxiv

0+阅读 · 2021年9月24日

Density-based Curriculum for Multi-goal Reinforcement Learning with Sparse Rewards

Arxiv

0+阅读 · 2021年9月24日

Dimension-Free Rates for Natural Policy Gradient in Multi-Agent Reinforcement Learning

Arxiv

0+阅读 · 2021年9月23日

On the Generalization of Stochastic Gradient Descent with Momentum

Arxiv

0+阅读 · 2021年9月23日

Pipeline PSRO: A Scalable Approach for Finding Approximate Nash Equilibria in Large Games

Arxiv

3+阅读 · 2020年6月15日

SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient

Arxiv

5+阅读 · 2017年8月25日

微信扫码咨询专知VIP会员