博弈理论中有关不完全信息扩展式游戏的近最优学习算法 (Near-Optimal Learning of Extensive-Form Games with Imperfect Information) - 专知论文

会员服务 ·

0

最优 · 不完全信息 · 博弈理论 · 样本复杂度 · 博弈 ·

2023 年 3 月 30 日

Near-Optimal Learning of Extensive-Form Games with Imperfect Information

翻译：博弈理论中有关不完全信息扩展式游戏的近最优学习算法

Yu Bai,Chi Jin,Song Mei,Tiancheng Yu

from arxiv, Updated V2 to be consistent with ICML 2022 camera-ready version, with an additional analysis of CFR in full-feedback setting in Appendix F

This paper resolves the open question of designing near-optimal algorithms for learning imperfect-information extensive-form games from bandit feedback. We present the first line of algorithms that require only $\widetilde{\mathcal{O}}((XA+YB)/\varepsilon^2)$ episodes of play to find an $\varepsilon$-approximate Nash equilibrium in two-player zero-sum games, where $X,Y$ are the number of information sets and $A,B$ are the number of actions for the two players. This improves upon the best known sample complexity of $\widetilde{\mathcal{O}}((X^2A+Y^2B)/\varepsilon^2)$ by a factor of $\widetilde{\mathcal{O}}(\max\{X, Y\})$, and matches the information-theoretic lower bound up to logarithmic factors. We achieve this sample complexity by two new algorithms: Balanced Online Mirror Descent, and Balanced Counterfactual Regret Minimization. Both algorithms rely on novel approaches of integrating \emph{balanced exploration policies} into their classical counterparts. We also extend our results to learning Coarse Correlated Equilibria in multi-player general-sum games.

翻译：本文解决从私有反馈学习不完美信息博弈的近最优算法设计的问题。我们提出了第一条算法线路，仅需要 $\widetilde{\mathcal{O}}((XA+YB)/\varepsilon^2)$ 次游戏来在两人零和游戏中找到一个 $\varepsilon$-近似纳什均衡。其中，$X,Y$ 是信息集的数量，$A,B$ 是两个玩家的可行动作的数量。这比已知最优样本复杂度 $\widetilde{\mathcal{O}}((X^2A+Y^2B)/\varepsilon^2)$ 提高了 $\widetilde{\mathcal{O}}(\max\{X,Y\})$ 的因子，且能够达到信息论的下界，相差对数因子。我们通过两种新算法来实现这种样本复杂度：平衡的在线镜像下降和平衡的可逆选择后悔最小化。两种算法都依赖于将平衡的探索策略集成到其经典对应方法中的新方法。我们还将我们的结果扩展到了多人常规和游戏中学习粗略相关均衡。

0

相关内容

【经典书】《无记忆多智能体系统中的博弈论学习和分布式优化》176页pdf

【经典书】《无记忆多智能体系统中的博弈论学习和分布式优化》176页pdf

专知会员服务

53+阅读 · 2022年6月14日

【2022新书】强化学习工业应用，408页pdf

【2022新书】强化学习工业应用，408页pdf

专知会员服务

231+阅读 · 2022年2月3日

【2021新书】分布式优化，博弈和学习算法，227页pdf

【2021新书】分布式优化，博弈和学习算法，227页pdf

专知会员服务

237+阅读 · 2021年5月25日

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

【斯坦福2021新书】决策算法，694页pdf阐述不确定性决策

【斯坦福2021新书】决策算法，694页pdf阐述不确定性决策

专知会员服务

264+阅读 · 2021年1月27日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

112+阅读 · 2020年5月15日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知会员服务

162+阅读 · 2020年1月16日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

【论文推荐】最新七篇强化学习相关论文—逻辑约束、综述、多任务深度强化学习、参数服务器、事件抽取、分层强化学习、过拟合研究

【论文推荐】最新七篇强化学习相关论文—逻辑约束、综述、多任务深度强化学习、参数服务器、事件抽取、分层强化学习、过拟合研究

专知

25+阅读 · 2018年4月29日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

不确定分数阶非线性系统Mittag-Leffler自适应控制

国家自然科学基金

1+阅读 · 2016年12月31日

基于信息熵和DCS的多基线SAR干涉理论与新方法研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于粘性解的随机时滞方程最优控制问题研究

国家自然科学基金

0+阅读 · 2013年12月31日

小波阈值估计的收敛性及密度函数估计问题的研究

国家自然科学基金

0+阅读 · 2013年12月31日

进化融合学习自适应的随机优化方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

离散有记忆信道下的协作中继选择和信息理论的研究

国家自然科学基金

1+阅读 · 2012年12月31日

广义Kloosterman和的均值估计

国家自然科学基金

1+阅读 · 2011年12月31日

多发射波形雷达运动目标检测与测速定位理论研究

国家自然科学基金

0+阅读 · 2009年12月31日

Nrf2在肿瘤耐药中的作用及其机制研究

国家自然科学基金

0+阅读 · 2008年12月31日

CPS标准下AGC的最优松驰控制及其马尔可夫决策过程

国家自然科学基金

1+阅读 · 2008年12月31日

Multi-User Reinforcement Learning with Low Rank Rewards

Arxiv

0+阅读 · 2023年5月22日

Markov $α$-Potential Games: Equilibrium Approximation and Regret Analysis

Arxiv

0+阅读 · 2023年5月21日

Bayesian Opponent Modeling in Multiplayer Imperfect-Information Games

Arxiv

0+阅读 · 2023年5月20日

Quadratic Memory is Necessary for Optimal Query Complexity in Convex Optimization: Center-of-Mass is Pareto-Optimal

Arxiv

0+阅读 · 2023年5月19日

On the Statistical Efficiency of Mean Field Reinforcement Learning with General Function Approximation

Arxiv

0+阅读 · 2023年5月18日

Game Theory with Simulation of Other Players

Arxiv

0+阅读 · 2023年5月18日

Black-Box Targeted Reward Poisoning Attack Against Online Deep Reinforcement Learning

Arxiv

0+阅读 · 2023年5月18日

Mastering the Game of Stratego with Model-Free Multiagent Reinforcement Learning

Arxiv

34+阅读 · 2022年6月30日

Automated Reinforcement Learning (AutoRL): A Survey and Open Problems

Automated Reinforcement Learning (AutoRL): A Survey and Open Problems

Arxiv

33+阅读 · 2022年1月11日

Exploration-Exploitation in Multi-Agent Learning: Catastrophe Theory Meets Game Theory

Exploration-Exploitation in Multi-Agent Learning: Catastrophe Theory Meets Game Theory

Arxiv

15+阅读 · 2020年12月15日

VIP会员

文章信息

相关主题

不完全信息

样本复杂度

相关VIP内容

【经典书】《无记忆多智能体系统中的博弈论学习和分布式优化》176页pdf

【经典书】《无记忆多智能体系统中的博弈论学习和分布式优化》176页pdf

专知会员服务

53+阅读 · 2022年6月14日

【2022新书】强化学习工业应用，408页pdf

【2022新书】强化学习工业应用，408页pdf

专知会员服务

231+阅读 · 2022年2月3日

【2021新书】分布式优化，博弈和学习算法，227页pdf

【2021新书】分布式优化，博弈和学习算法，227页pdf

专知会员服务

237+阅读 · 2021年5月25日

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

【斯坦福2021新书】决策算法，694页pdf阐述不确定性决策

【斯坦福2021新书】决策算法，694页pdf阐述不确定性决策

专知会员服务

264+阅读 · 2021年1月27日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

112+阅读 · 2020年5月15日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知会员服务

162+阅读 · 2020年1月16日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《俄乌战争中的无人系统：新的战争方式与新兴趋势——来自前线的印象》报告

《海上自主水面船舶远程操作中心：安全可持续运行的多维度分析》

多模态大语言模型下游调优中“保持自我”的重要性

隐身自主无人水下航行器技术如何变革水下作战并重塑海军竞争

相关资讯

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

【论文推荐】最新七篇强化学习相关论文—逻辑约束、综述、多任务深度强化学习、参数服务器、事件抽取、分层强化学习、过拟合研究

【论文推荐】最新七篇强化学习相关论文—逻辑约束、综述、多任务深度强化学习、参数服务器、事件抽取、分层强化学习、过拟合研究

专知

25+阅读 · 2018年4月29日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

相关论文

Multi-User Reinforcement Learning with Low Rank Rewards

Arxiv

0+阅读 · 2023年5月22日

Markov $α$-Potential Games: Equilibrium Approximation and Regret Analysis

Arxiv

0+阅读 · 2023年5月21日

Bayesian Opponent Modeling in Multiplayer Imperfect-Information Games

Arxiv

0+阅读 · 2023年5月20日

Quadratic Memory is Necessary for Optimal Query Complexity in Convex Optimization: Center-of-Mass is Pareto-Optimal

Arxiv

0+阅读 · 2023年5月19日

On the Statistical Efficiency of Mean Field Reinforcement Learning with General Function Approximation

Arxiv

0+阅读 · 2023年5月18日

Game Theory with Simulation of Other Players

Arxiv

0+阅读 · 2023年5月18日

Black-Box Targeted Reward Poisoning Attack Against Online Deep Reinforcement Learning

Arxiv

0+阅读 · 2023年5月18日

Mastering the Game of Stratego with Model-Free Multiagent Reinforcement Learning

Arxiv

34+阅读 · 2022年6月30日

Automated Reinforcement Learning (AutoRL): A Survey and Open Problems

Automated Reinforcement Learning (AutoRL): A Survey and Open Problems

Arxiv

33+阅读 · 2022年1月11日

Exploration-Exploitation in Multi-Agent Learning: Catastrophe Theory Meets Game Theory

Exploration-Exploitation in Multi-Agent Learning: Catastrophe Theory Meets Game Theory

Arxiv

15+阅读 · 2020年12月15日

相关基金

不确定分数阶非线性系统Mittag-Leffler自适应控制

国家自然科学基金

1+阅读 · 2016年12月31日

基于信息熵和DCS的多基线SAR干涉理论与新方法研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于粘性解的随机时滞方程最优控制问题研究

国家自然科学基金

0+阅读 · 2013年12月31日

小波阈值估计的收敛性及密度函数估计问题的研究

国家自然科学基金

0+阅读 · 2013年12月31日

进化融合学习自适应的随机优化方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

离散有记忆信道下的协作中继选择和信息理论的研究

国家自然科学基金

1+阅读 · 2012年12月31日

广义Kloosterman和的均值估计

国家自然科学基金

1+阅读 · 2011年12月31日

多发射波形雷达运动目标检测与测速定位理论研究

国家自然科学基金

0+阅读 · 2009年12月31日

Nrf2在肿瘤耐药中的作用及其机制研究

国家自然科学基金

0+阅读 · 2008年12月31日

CPS标准下AGC的最优松驰控制及其马尔可夫决策过程

国家自然科学基金

1+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员