统一强化学习、量化响应均衡和双人零和博弈的方法 (A Unified Approach to Reinforcement Learning, Quantal Response Equilibria, and Two-Player Zero-Sum Games) - 专知论文

会员服务 ·

0

博弈 · 均衡 · 强化学习 · 强化学习算法 · 算法 ·

2023 年 4 月 11 日

A Unified Approach to Reinforcement Learning, Quantal Response Equilibria, and Two-Player Zero-Sum Games

翻译：统一强化学习、量化响应均衡和双人零和博弈的方法

Samuel Sokota,Ryan D'Orazio,J. Zico Kolter,Nicolas Loizou,Marc Lanctot,Ioannis Mitliagkas,Noam Brown,Christian Kroer

This work studies an algorithm, which we call magnetic mirror descent, that is inspired by mirror descent and the non-Euclidean proximal gradient algorithm. Our contribution is demonstrating the virtues of magnetic mirror descent as both an equilibrium solver and as an approach to reinforcement learning in two-player zero-sum games. These virtues include: 1) Being the first quantal response equilibria solver to achieve linear convergence for extensive-form games with first order feedback; 2) Being the first standard reinforcement learning algorithm to achieve empirically competitive results with CFR in tabular settings; 3) Achieving favorable performance in 3x3 Dark Hex and Phantom Tic-Tac-Toe as a self-play deep reinforcement learning algorithm.

翻译：本文研究了一种算法，称之为“磁性镜面下降”，受到镜面下降和非欧几里得近端梯度算法的启发。我们的贡献是展示了磁性镜面下降作为平衡求解器以及在双人零和博弈中作为强化学习方法的好处。这些优点包括：1）作为第一个能够在具有一阶反馈的扩展型博弈中实现线性收敛的量化响应均衡求解器；2）作为第一个在表格设置中与 CFR 相比取得实验证明优越性的标准强化学习算法；3）作为自我对弈深度强化学习算法在 3x3 暗黑棋和奇幻井字棋中实现了有利的性能。

0

相关内容

【“大量”智能体的强化学习】《Many-Agent Reinforcement Learning》，327页博士论文，伦敦大学学院（UCL）

【“大量”智能体的强化学习】《Many-Agent Reinforcement Learning》，327页博士论文，伦敦大学学院（UCL）

专知会员服务

118+阅读 · 2022年5月7日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

77+阅读 · 2022年3月15日

哥伦比亚大学最新《机器学习》课程，Fall-B 2020 (Machine Learning)

专知会员服务

39+阅读 · 2020年11月3日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

80+阅读 · 2020年7月26日

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

专知会员服务

130+阅读 · 2020年4月19日

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

专知会员服务

84+阅读 · 2020年2月18日

【微软Alekh等开放新书】强化学习理论与算法（Reinforcement Learning:Theory and Algorithms），附83页pdf

【微软Alekh等开放新书】强化学习理论与算法（Reinforcement Learning:Theory and Algorithms），附83页pdf

专知会员服务

121+阅读 · 2019年11月24日

实时强化学习《Real-Time Reinforcement Learning》S Ramstedt, C Pal [Mila, Element AI] (2019)

实时强化学习《Real-Time Reinforcement Learning》S Ramstedt, C Pal [Mila, Element AI] (2019)

专知会员服务

13+阅读 · 2019年11月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

【NeurIPS 2020 Tutorial】离线强化学习:从算法到挑战，80页ppt

【NeurIPS 2020 Tutorial】离线强化学习:从算法到挑战，80页ppt

专知

15+阅读 · 2020年12月9日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

42+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

17+阅读 · 2018年12月24日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

基于氢化非晶硅波导的芯片式简并关联光子对源

国家自然科学基金

0+阅读 · 2015年12月31日

克罗恩病中干预TLE1逆转凋亡介导的肠道粘膜自噬紊乱的策略研究

国家自然科学基金

0+阅读 · 2015年12月31日

臭氧光催化转化的基础研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于LTCC的光学微腔微流控集成传感芯片基础研究

国家自然科学基金

1+阅读 · 2013年12月31日

考虑能源效率的批调度问题研究与算法设计

国家自然科学基金

0+阅读 · 2012年12月31日

向量优化问题的近似解的最优性条件

国家自然科学基金

0+阅读 · 2012年12月31日

生物质多相流光合产氢过程调控机理及光热传输特性研究

国家自然科学基金

0+阅读 · 2012年12月31日

现场激光诱导击穿光谱的化学计量学理论及方法集成

国家自然科学基金

0+阅读 · 2012年12月31日

演化和蚁群算法的近似性能分析

国家自然科学基金

0+阅读 · 2011年12月31日

斜拉桥梁－索－阻尼器耦合振动研究

国家自然科学基金

0+阅读 · 2008年12月31日

Benchmarking Actor-Critic Deep Reinforcement Learning Algorithms for Robotics Control with Action Constraints

Arxiv

0+阅读 · 2023年5月29日

No-Regret Learning in Dynamic Competition with Reference Effects Under Logit Demand

Arxiv

0+阅读 · 2023年5月27日

No-Regret Online Reinforcement Learning with Adversarial Losses and Transitions

Arxiv

0+阅读 · 2023年5月27日

Equilibria and Convergence in Fire Sale Games

Arxiv

0+阅读 · 2023年5月26日

Last-Iterate Convergence with Full and Noisy Feedback in Two-Player Zero-Sum Games

Arxiv

0+阅读 · 2023年5月26日

A Slingshot Approach to Learning in Monotone Games

Arxiv

0+阅读 · 2023年5月26日

Markov $α$-Potential Games: Equilibrium Approximation and Regret Analysis

Markov $α$-Potential Games: Equilibrium Approximation and Regret Analysis

Arxiv

0+阅读 · 2023年5月24日

Mastering the Game of Stratego with Model-Free Multiagent Reinforcement Learning

Arxiv

34+阅读 · 2022年6月30日

Automated Reinforcement Learning (AutoRL): A Survey and Open Problems

Automated Reinforcement Learning (AutoRL): A Survey and Open Problems

Arxiv

33+阅读 · 2022年1月11日

Max-Margin Contrastive Learning

Max-Margin Contrastive Learning

Arxiv

18+阅读 · 2021年12月21日

VIP会员

文章信息

相关主题

强化学习算法

相关VIP内容

【“大量”智能体的强化学习】《Many-Agent Reinforcement Learning》，327页博士论文，伦敦大学学院（UCL）

【“大量”智能体的强化学习】《Many-Agent Reinforcement Learning》，327页博士论文，伦敦大学学院（UCL）

专知会员服务

118+阅读 · 2022年5月7日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

77+阅读 · 2022年3月15日

哥伦比亚大学最新《机器学习》课程，Fall-B 2020 (Machine Learning)

专知会员服务

39+阅读 · 2020年11月3日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

80+阅读 · 2020年7月26日

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

专知会员服务

130+阅读 · 2020年4月19日

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

专知会员服务

84+阅读 · 2020年2月18日

【微软Alekh等开放新书】强化学习理论与算法（Reinforcement Learning:Theory and Algorithms），附83页pdf

【微软Alekh等开放新书】强化学习理论与算法（Reinforcement Learning:Theory and Algorithms），附83页pdf

专知会员服务

121+阅读 · 2019年11月24日

实时强化学习《Real-Time Reinforcement Learning》S Ramstedt, C Pal [Mila, Element AI] (2019)

实时强化学习《Real-Time Reinforcement Learning》S Ramstedt, C Pal [Mila, Element AI] (2019)

专知会员服务

13+阅读 · 2019年11月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

中文版 | 美陆军C5ISR中心以反无人机技术支援边境行动

《防务领域人工智能可信赖性：为防务开发负责任、符合伦理且可信赖的AI系统》欧洲防务局2025最新107页

中文版 | 美空军探索空射“战斗机无人机”式协同作战飞机

中文版 | 俄乌战争最新动态（5月9日）

相关资讯

【NeurIPS 2020 Tutorial】离线强化学习:从算法到挑战，80页ppt

【NeurIPS 2020 Tutorial】离线强化学习:从算法到挑战，80页ppt

专知

15+阅读 · 2020年12月9日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

42+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

17+阅读 · 2018年12月24日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Benchmarking Actor-Critic Deep Reinforcement Learning Algorithms for Robotics Control with Action Constraints

Arxiv

0+阅读 · 2023年5月29日

No-Regret Learning in Dynamic Competition with Reference Effects Under Logit Demand

Arxiv

0+阅读 · 2023年5月27日

No-Regret Online Reinforcement Learning with Adversarial Losses and Transitions

Arxiv

0+阅读 · 2023年5月27日

Equilibria and Convergence in Fire Sale Games

Arxiv

0+阅读 · 2023年5月26日

Last-Iterate Convergence with Full and Noisy Feedback in Two-Player Zero-Sum Games

Arxiv

0+阅读 · 2023年5月26日

A Slingshot Approach to Learning in Monotone Games

Arxiv

0+阅读 · 2023年5月26日

Markov $α$-Potential Games: Equilibrium Approximation and Regret Analysis

Markov $α$-Potential Games: Equilibrium Approximation and Regret Analysis

Arxiv

0+阅读 · 2023年5月24日

Mastering the Game of Stratego with Model-Free Multiagent Reinforcement Learning

Arxiv

34+阅读 · 2022年6月30日

Automated Reinforcement Learning (AutoRL): A Survey and Open Problems

Automated Reinforcement Learning (AutoRL): A Survey and Open Problems

Arxiv

33+阅读 · 2022年1月11日

Max-Margin Contrastive Learning

Max-Margin Contrastive Learning

Arxiv

18+阅读 · 2021年12月21日

相关基金

基于氢化非晶硅波导的芯片式简并关联光子对源

国家自然科学基金

0+阅读 · 2015年12月31日

克罗恩病中干预TLE1逆转凋亡介导的肠道粘膜自噬紊乱的策略研究

国家自然科学基金

0+阅读 · 2015年12月31日

臭氧光催化转化的基础研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于LTCC的光学微腔微流控集成传感芯片基础研究

国家自然科学基金

1+阅读 · 2013年12月31日

考虑能源效率的批调度问题研究与算法设计

国家自然科学基金

0+阅读 · 2012年12月31日

向量优化问题的近似解的最优性条件

国家自然科学基金

0+阅读 · 2012年12月31日

生物质多相流光合产氢过程调控机理及光热传输特性研究

国家自然科学基金

0+阅读 · 2012年12月31日

现场激光诱导击穿光谱的化学计量学理论及方法集成

国家自然科学基金

0+阅读 · 2012年12月31日

演化和蚁群算法的近似性能分析

国家自然科学基金

0+阅读 · 2011年12月31日

斜拉桥梁－索－阻尼器耦合振动研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员