以经验政策优化为nn$-PlayerMarkov 游戏提供经验政策优化 (Empirical Policy Optimization for $n$-Player Markov Games) - 专知论文

会员服务 ·

0

优化器 · INTERACT · 纳什均衡 · 样例 · Performer ·

2021 年 10 月 18 日

Empirical Policy Optimization for $n$-Player Markov Games

翻译：以经验政策优化为nn$-PlayerMarkov 游戏提供经验政策优化

Yuanheng Zhu,Dongbin Zhao,Mengchen Zhao,Dong Li

In single-agent Markov decision processes, an agent can optimize its policy based on the interaction with environment. In multi-player Markov games (MGs), however, the interaction is non-stationary due to the behaviors of other players, so the agent has no fixed optimization objective. In this paper, we treat the evolution of player policies as a dynamical process and propose a novel learning scheme for Nash equilibrium. The core is to evolve one's policy according to not just its current in-game performance, but an aggregation of its performance over history. We show that for a variety of MGs, players in our learning scheme will provably converge to a point that is an approximation to Nash equilibrium. Combined with neural networks, we develop the \emph{empirical policy optimization} algorithm, that is implemented in a reinforcement-learning framework and runs in a distributed way, with each player optimizing its policy based on own observations. We use two numerical examples to validate the convergence property on small-scale MGs with $n\ge 2$ players, and a pong example to show the potential of our algorithm on large games.

翻译：在单一试剂Markov 决策过程中, 代理商可以在与环境互动的基础上优化其政策。但是, 在多玩家Markov 游戏( MGs) 中, 互动是非静止的, 因为其他玩家的行为, 所以代理商没有固定的优化目标。在本文中, 我们把玩家政策的演进看成是一个动态的过程, 并为纳什均衡提出一个新的学习计划。核心是根据其当前的游戏性能来制定自己的政策, 并且将其业绩与历史相提并论。我们显示, 对于各种 MGs 来说, 我们学习计划中的玩家将会以近似于 Nash 平衡的方式聚集到一个点上。我们与神经网络一起, 我们开发了 \ emph{ impirical 政策优化算法, 在强化学习框架内实施, 并以分布方式运行, 由每个玩家根据自己的观察来优化其政策。我们用两个数字示例来验证小型 MGs 与 $nge 2 玩家的趋同属性的趋同性, 并用一个 Pong 示例来显示大型游戏的算法的潜力。

0

相关内容

优化器

【经典书】强化学习算法，98页pdf

专知会员服务

130+阅读 · 2021年8月25日

【经典书】算法博弈论，775页pdf，Algorithmic Game Theory

【经典书】算法博弈论，775页pdf，Algorithmic Game Theory

专知会员服务

154+阅读 · 2021年5月9日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

111+阅读 · 2020年5月15日

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

专知会员服务

131+阅读 · 2020年4月19日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

经典书《斯坦福大学-多智能体系统》532页pdf，MULTIAGENT SYSTEMS Algorithmic, Game-Theoretic, and Logical Foundations

经典书《斯坦福大学-多智能体系统》532页pdf，MULTIAGENT SYSTEMS Algorithmic, Game-Theoretic, and Logical Foundations

专知会员服务

158+阅读 · 2020年1月29日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

强化学习扫盲贴：从Q-learning到DQN

强化学习扫盲贴：从Q-learning到DQN

夕小瑶的卖萌屋

52+阅读 · 2019年10月13日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

spinningup.openai 强化学习资源完整

spinningup.openai 强化学习资源完整

CreateAMind

6+阅读 · 2018年12月17日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

分布式TensorFlow入门指南

分布式TensorFlow入门指南

机器学习研究会

4+阅读 · 2017年11月28日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

Multi-Leader Congestion Games with an Adversary

Arxiv

0+阅读 · 2021年12月14日

Conjugated Discrete Distributions for Distributional Reinforcement Learning

Arxiv

0+阅读 · 2021年12月14日

Biased Gradient Estimate with Drastic Variance Reduction for Meta Reinforcement Learning

Arxiv

0+阅读 · 2021年12月14日

Safe Linear Leveling Bandits

Arxiv

0+阅读 · 2021年12月13日

Fictitious play in zero-sum stochastic games

Arxiv

0+阅读 · 2021年12月12日

Decentralized Q-Learning in Zero-sum Markov Games

Arxiv

1+阅读 · 2021年12月12日

Fast computation of distance-generalized cores using sampling

Arxiv

0+阅读 · 2021年12月12日

Fast generalized Nash equilibrium seeking under partial-decision information

Arxiv

0+阅读 · 2021年12月11日

Pipeline PSRO: A Scalable Approach for Finding Approximate Nash Equilibria in Large Games

Arxiv

3+阅读 · 2020年6月15日

Hierarchically Fair Federated Learning

Arxiv

3+阅读 · 2020年5月1日

VIP会员

文章信息

相关主题

相关VIP内容

【经典书】强化学习算法，98页pdf

专知会员服务

130+阅读 · 2021年8月25日

【经典书】算法博弈论，775页pdf，Algorithmic Game Theory

【经典书】算法博弈论，775页pdf，Algorithmic Game Theory

专知会员服务

154+阅读 · 2021年5月9日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

111+阅读 · 2020年5月15日

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

专知会员服务

131+阅读 · 2020年4月19日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

经典书《斯坦福大学-多智能体系统》532页pdf，MULTIAGENT SYSTEMS Algorithmic, Game-Theoretic, and Logical Foundations

经典书《斯坦福大学-多智能体系统》532页pdf，MULTIAGENT SYSTEMS Algorithmic, Game-Theoretic, and Logical Foundations

专知会员服务

158+阅读 · 2020年1月29日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

《美国海军陆战队软件定义网络应用案例：分布式防火墙自动化系统》148页

《多体环境下定位导航授时（PNT）系统研究》228页

软件定义无线电（SDR）：商业与军事领域的技术、应用及未来趋势

《攻势防空作战中无人追击者/规避者最优轨迹研究（含动态交战区建模）》95页

相关资讯

强化学习扫盲贴：从Q-learning到DQN

强化学习扫盲贴：从Q-learning到DQN

夕小瑶的卖萌屋

52+阅读 · 2019年10月13日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

spinningup.openai 强化学习资源完整

spinningup.openai 强化学习资源完整

CreateAMind

6+阅读 · 2018年12月17日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

分布式TensorFlow入门指南

分布式TensorFlow入门指南

机器学习研究会

4+阅读 · 2017年11月28日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

Multi-Leader Congestion Games with an Adversary

Arxiv

0+阅读 · 2021年12月14日

Conjugated Discrete Distributions for Distributional Reinforcement Learning

Arxiv

0+阅读 · 2021年12月14日

Biased Gradient Estimate with Drastic Variance Reduction for Meta Reinforcement Learning

Arxiv

0+阅读 · 2021年12月14日

Safe Linear Leveling Bandits

Arxiv

0+阅读 · 2021年12月13日

Fictitious play in zero-sum stochastic games

Arxiv

0+阅读 · 2021年12月12日

Decentralized Q-Learning in Zero-sum Markov Games

Arxiv

1+阅读 · 2021年12月12日

Fast computation of distance-generalized cores using sampling

Arxiv

0+阅读 · 2021年12月12日

Fast generalized Nash equilibrium seeking under partial-decision information

Arxiv

0+阅读 · 2021年12月11日

Pipeline PSRO: A Scalable Approach for Finding Approximate Nash Equilibria in Large Games

Arxiv

3+阅读 · 2020年6月15日

Hierarchically Fair Federated Learning

Arxiv

3+阅读 · 2020年5月1日

微信扫码咨询专知VIP会员