普通中战场运动会可被选的有说服力的场外游戏 (Provable Fictitious Play for General Mean-Field Games) - 专知论文

会员服务 ·

0

纳什均衡 · 学成 · 平稳的 · 强化学习 · contrastive ·

2020 年 10 月 8 日

Provable Fictitious Play for General Mean-Field Games

翻译：普通中战场运动会可被选的有说服力的场外游戏

Qiaomin Xie,Zhuoran Yang,Zhaoran Wang,Andreea Minca

We propose a reinforcement learning algorithm for stationary mean-field games, where the goal is to learn a pair of mean-field state and stationary policy that constitutes the Nash equilibrium. When viewing the mean-field state and the policy as two players, we propose a fictitious play algorithm which alternatively updates the mean-field state and the policy via gradient-descent and proximal policy optimization, respectively. Our algorithm is in stark contrast with previous literature which solves each single-agent reinforcement learning problem induced by the iterates mean-field states to the optimum. Furthermore, we prove that our fictitious play algorithm converges to the Nash equilibrium at a sublinear rate. To the best of our knowledge, this seems the first provably convergent single-loop reinforcement learning algorithm for mean-field games based on iterative updates of both mean-field state and policy.

翻译：我们为固定平均场游戏建议了一个强化学习算法,目的是学习构成纳什均衡的两套平均场状态和固定地政策。当我们把平均场状态和政策看成是两个玩家时,我们提出一个假游戏算法,通过梯度-白度和近似度政策优化分别更新平均场状态和政策。我们的算法与以前的文献截然不同,这些文献解决了每个单一试剂强化学习问题,而前者是由列位平均场状态到最佳状态引发的。此外,我们证明我们的虚构游戏算法以亚线性速率与纳什平衡相融合。根据我们所知,这似乎是第一个基于均线状态和政策的迭接更新而成的普通场状态和政策的单圈强化学习算法。

0

相关内容

纳什均衡

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

专知会员服务

74+阅读 · 2020年8月2日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

111+阅读 · 2020年5月15日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【开放书】部分观测动态系统的贝叶斯学习，119页pdf，Bayesian Learning for partially observed dynamical systems

【开放书】部分观测动态系统的贝叶斯学习，119页pdf，Bayesian Learning for partially observed dynamical systems

专知会员服务

41+阅读 · 2019年12月27日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

【斯坦福大学NeuralPS2019】GNN解释器，GNNExplainer: Generating Explanations for Graph Neural Networks，斯坦福大学|Jure Leskovec

【斯坦福大学NeuralPS2019】GNN解释器，GNNExplainer: Generating Explanations for Graph Neural Networks，斯坦福大学|Jure Leskovec

专知会员服务

89+阅读 · 2019年10月13日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

spinningup.openai 强化学习资源完整

spinningup.openai 强化学习资源完整

CreateAMind

6+阅读 · 2018年12月17日

神经网络学习率设置

神经网络学习率设置

机器学习研究会

4+阅读 · 2018年3月3日

分布式TensorFlow入门指南

分布式TensorFlow入门指南

机器学习研究会

4+阅读 · 2017年11月28日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

Exploratory LQG Mean Field Games with Entropy Regularization

Arxiv

0+阅读 · 2020年11月25日

Iterative training of neural networks for intra prediction

Arxiv

1+阅读 · 2020年11月25日

Leveraging Predictions in Smoothed Online Convex Optimization via Gradient-based Algorithms

Arxiv

0+阅读 · 2020年11月25日

MIMO Radar Waveform-Filter Design for Extended Target Detection from a View of Games

Arxiv

0+阅读 · 2020年11月23日

Multi-agent Deep FBSDE Representation For Large Scale Stochastic Differential Games

Arxiv

0+阅读 · 2020年11月21日

Fast and deep: energy-efficient neuromorphic learning with first-spike times

Arxiv

0+阅读 · 2020年11月19日

Some Game Theoretic Remarks on Two-Player Generalized Cops and Robbers Games

Arxiv

0+阅读 · 2020年11月19日

Distributed projected-reflected-gradient algorithms for stochastic generalized Nash equilibrium problems

Arxiv

0+阅读 · 2020年11月18日

Pipeline PSRO: A Scalable Approach for Finding Approximate Nash Equilibria in Large Games

Arxiv

3+阅读 · 2020年6月15日

Mean Field Multi-Agent Reinforcement Learning

Arxiv

5+阅读 · 2018年6月12日

VIP会员

文章信息

相关主题

相关VIP内容

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

专知会员服务

74+阅读 · 2020年8月2日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

111+阅读 · 2020年5月15日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【开放书】部分观测动态系统的贝叶斯学习，119页pdf，Bayesian Learning for partially observed dynamical systems

【开放书】部分观测动态系统的贝叶斯学习，119页pdf，Bayesian Learning for partially observed dynamical systems

专知会员服务

41+阅读 · 2019年12月27日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

【斯坦福大学NeuralPS2019】GNN解释器，GNNExplainer: Generating Explanations for Graph Neural Networks，斯坦福大学|Jure Leskovec

【斯坦福大学NeuralPS2019】GNN解释器，GNNExplainer: Generating Explanations for Graph Neural Networks，斯坦福大学|Jure Leskovec

专知会员服务

89+阅读 · 2019年10月13日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

新书册《几何深度学习的数学基础》

中程单向攻击无人机的战略意义：俄乌战争启示

在无标注条件下适配视觉—语言模型：全面综述

面向视觉语言模型的持续学习：遗忘之外的综述与分类体系

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

spinningup.openai 强化学习资源完整

spinningup.openai 强化学习资源完整

CreateAMind

6+阅读 · 2018年12月17日

神经网络学习率设置

神经网络学习率设置

机器学习研究会

4+阅读 · 2018年3月3日

分布式TensorFlow入门指南

分布式TensorFlow入门指南

机器学习研究会

4+阅读 · 2017年11月28日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

Exploratory LQG Mean Field Games with Entropy Regularization

Arxiv

0+阅读 · 2020年11月25日

Iterative training of neural networks for intra prediction

Arxiv

1+阅读 · 2020年11月25日

Leveraging Predictions in Smoothed Online Convex Optimization via Gradient-based Algorithms

Arxiv

0+阅读 · 2020年11月25日

MIMO Radar Waveform-Filter Design for Extended Target Detection from a View of Games

Arxiv

0+阅读 · 2020年11月23日

Multi-agent Deep FBSDE Representation For Large Scale Stochastic Differential Games

Arxiv

0+阅读 · 2020年11月21日

Fast and deep: energy-efficient neuromorphic learning with first-spike times

Arxiv

0+阅读 · 2020年11月19日

Some Game Theoretic Remarks on Two-Player Generalized Cops and Robbers Games

Arxiv

0+阅读 · 2020年11月19日

Distributed projected-reflected-gradient algorithms for stochastic generalized Nash equilibrium problems

Arxiv

0+阅读 · 2020年11月18日

Pipeline PSRO: A Scalable Approach for Finding Approximate Nash Equilibria in Large Games

Arxiv

3+阅读 · 2020年6月15日

Mean Field Multi-Agent Reinforcement Learning

Arxiv

5+阅读 · 2018年6月12日

微信扫码咨询专知VIP会员