博弈树不完美信息近最优学习 (Near-Optimal Learning of Extensive-Form Games with Imperfect Information) - 专知论文

会员服务 ·

0

不完美信息 · 博弈 · 最优 · 算法 · 因子 ·

2023 年 4 月 3 日

Near-Optimal Learning of Extensive-Form Games with Imperfect Information

翻译：博弈树不完美信息近最优学习

Yu Bai,Chi Jin,Song Mei,Tiancheng Yu

from arxiv, Updated V3 to be consistent with ICML 2022 camera-ready version, with an additional analysis of CFR in full-feedback setting in Appendix F

This paper resolves the open question of designing near-optimal algorithms for learning imperfect-information extensive-form games from bandit feedback. We present the first line of algorithms that require only $\widetilde{\mathcal{O}}((XA+YB)/\varepsilon^2)$ episodes of play to find an $\varepsilon$-approximate Nash equilibrium in two-player zero-sum games, where $X,Y$ are the number of information sets and $A,B$ are the number of actions for the two players. This improves upon the best known sample complexity of $\widetilde{\mathcal{O}}((X^2A+Y^2B)/\varepsilon^2)$ by a factor of $\widetilde{\mathcal{O}}(\max\{X, Y\})$, and matches the information-theoretic lower bound up to logarithmic factors. We achieve this sample complexity by two new algorithms: Balanced Online Mirror Descent, and Balanced Counterfactual Regret Minimization. Both algorithms rely on novel approaches of integrating \emph{balanced exploration policies} into their classical counterparts. We also extend our results to learning Coarse Correlated Equilibria in multi-player general-sum games.

翻译：本文解决了从贝叶斯博弈树不完美信息中的有限积分反馈中设计接近最优算法的开放问题。我们提出了第一种算法线，这些算法只需要 $\widetilde{\mathcal {O}}((XA+YB)/\varepsilon^2)$ 次游戏，就可以在双人零和游戏中找到一个 $\varepsilon$-近似的 Nash 平衡，其中 $X,Y$ 是信息集的数量，$A,B$ 是两个玩家的动作数量。这将最佳已知采样复杂度 $\widetilde{\mathcal {O}}((X^2A+Y^2B)/\varepsilon^2)$ 与 $\widetilde{\mathcal{O}}(\max\{X,Y\})$ 的因子相比，提高了一倍，可匹配信息理论下限，达到了对数因子。我们通过两种新算法实现了这种样本复杂度：平衡联机镜像下降和平衡的反事实遗憾最小化。两种算法都依赖于将\emph{平衡探索策略}集成到其经典对应物中的新方法。我们还将我们的结果扩展到学习多人一般和博弈的粗略相关均衡。

0

相关内容

不完美信息

不完美信息

【干货书】数据分析优化，Optimization for Modern Data Analysis，117页pdf

【干货书】数据分析优化，Optimization for Modern Data Analysis，117页pdf

专知会员服务

66+阅读 · 2023年2月15日

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

【伯克利-Ke Li】学习优化，74页ppt，Learning to Optimize

【伯克利-Ke Li】学习优化，74页ppt，Learning to Optimize

专知会员服务

41+阅读 · 2020年7月23日

【新书】人工智能Python代码，227页pdf，Python code for Artificial Intelligence: Foundations of Computational Agents

【新书】人工智能Python代码，227页pdf，Python code for Artificial Intelligence: Foundations of Computational Agents

专知会员服务

103+阅读 · 2020年6月21日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

随机图和随机环境中的接触过程、选举模型、排他过程

国家自然科学基金

0+阅读 · 2015年12月31日

淡水鱼主要致敏原经小鼠肠道树突状细胞CD80/86介导CD4+T细胞活化的机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于鲁棒非合作博弈的飞蜂窝网络资源分配与干扰管理

国家自然科学基金

1+阅读 · 2014年12月31日

一类随机均衡约束优化问题的样本均值逼近-正则化方法及其在经济学模型中的应用

国家自然科学基金

0+阅读 · 2013年12月31日

神经网络随机学习算法的泛化性研究

国家自然科学基金

2+阅读 · 2013年12月31日

多复变中的L2估计

国家自然科学基金

0+阅读 · 2012年12月31日

基于策略迭代算法的随机Markov跳变系统优化控制研究

国家自然科学基金

0+阅读 · 2012年12月31日

非负二次函数锥规划研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于智能多主体的非常规突发事件多准则动态应急决策研究

国家自然科学基金

0+阅读 · 2009年12月31日

电解大功率整流供电系统的群智能优化控制模型及策略研究

国家自然科学基金

0+阅读 · 2009年12月31日

Near-Minimax-Optimal Risk-Sensitive Reinforcement Learning with CVaR

Arxiv

0+阅读 · 2023年5月24日

Regret-Optimal Model-Free Reinforcement Learning for Discounted MDPs with Short Burn-In Time

Arxiv

0+阅读 · 2023年5月24日

Decision-Aware Actor-Critic with Function Approximation and Theoretical Guarantees

Arxiv

0+阅读 · 2023年5月24日

First- and Second-Order Bounds for Adversarial Linear Contextual Bandits

Arxiv

0+阅读 · 2023年5月24日

Under-Parameterized Double Descent for Ridge Regularized Least Squares Denoising of Data on a Line

Arxiv

0+阅读 · 2023年5月24日

Constrained Proximal Policy Optimization

Arxiv

0+阅读 · 2023年5月23日

Learning Optimal Biomarker-Guided Treatment Policy for Chronic Disorders

Arxiv

0+阅读 · 2023年5月23日

Mastering the Game of Stratego with Model-Free Multiagent Reinforcement Learning

Arxiv

34+阅读 · 2022年6月30日

Information-theoretic generalization bounds for black-box learning algorithms

Arxiv

12+阅读 · 2021年10月4日

Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks

Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks

Arxiv

13+阅读 · 2020年6月24日

VIP会员

文章信息

相关主题

不完美信息

相关VIP内容

【干货书】数据分析优化，Optimization for Modern Data Analysis，117页pdf

【干货书】数据分析优化，Optimization for Modern Data Analysis，117页pdf

专知会员服务

66+阅读 · 2023年2月15日

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

【伯克利-Ke Li】学习优化，74页ppt，Learning to Optimize

【伯克利-Ke Li】学习优化，74页ppt，Learning to Optimize

专知会员服务

41+阅读 · 2020年7月23日

【新书】人工智能Python代码，227页pdf，Python code for Artificial Intelligence: Foundations of Computational Agents

【新书】人工智能Python代码，227页pdf，Python code for Artificial Intelligence: Foundations of Computational Agents

专知会员服务

103+阅读 · 2020年6月21日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《俄乌战争背景下俄罗斯的战略性海军分析（2022-2025年）》最新100页报告

【斯坦福博士论文】数据、决策与依赖：构建可信人工智能的挑战

人工智能时代背景下的未来海战

接触战中的无人机优势：美军旅级部队面临的小型无人机系统挑战与调整

相关资讯

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Near-Minimax-Optimal Risk-Sensitive Reinforcement Learning with CVaR

Arxiv

0+阅读 · 2023年5月24日

Regret-Optimal Model-Free Reinforcement Learning for Discounted MDPs with Short Burn-In Time

Arxiv

0+阅读 · 2023年5月24日

Decision-Aware Actor-Critic with Function Approximation and Theoretical Guarantees

Arxiv

0+阅读 · 2023年5月24日

First- and Second-Order Bounds for Adversarial Linear Contextual Bandits

Arxiv

0+阅读 · 2023年5月24日

Under-Parameterized Double Descent for Ridge Regularized Least Squares Denoising of Data on a Line

Arxiv

0+阅读 · 2023年5月24日

Constrained Proximal Policy Optimization

Arxiv

0+阅读 · 2023年5月23日

Learning Optimal Biomarker-Guided Treatment Policy for Chronic Disorders

Arxiv

0+阅读 · 2023年5月23日

Mastering the Game of Stratego with Model-Free Multiagent Reinforcement Learning

Arxiv

34+阅读 · 2022年6月30日

Information-theoretic generalization bounds for black-box learning algorithms

Arxiv

12+阅读 · 2021年10月4日

Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks

Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks

Arxiv

13+阅读 · 2020年6月24日

相关基金

随机图和随机环境中的接触过程、选举模型、排他过程

国家自然科学基金

0+阅读 · 2015年12月31日

淡水鱼主要致敏原经小鼠肠道树突状细胞CD80/86介导CD4+T细胞活化的机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于鲁棒非合作博弈的飞蜂窝网络资源分配与干扰管理

国家自然科学基金

1+阅读 · 2014年12月31日

一类随机均衡约束优化问题的样本均值逼近-正则化方法及其在经济学模型中的应用

国家自然科学基金

0+阅读 · 2013年12月31日

神经网络随机学习算法的泛化性研究

国家自然科学基金

2+阅读 · 2013年12月31日

多复变中的L2估计

国家自然科学基金

0+阅读 · 2012年12月31日

基于策略迭代算法的随机Markov跳变系统优化控制研究

国家自然科学基金

0+阅读 · 2012年12月31日

非负二次函数锥规划研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于智能多主体的非常规突发事件多准则动态应急决策研究

国家自然科学基金

0+阅读 · 2009年12月31日

电解大功率整流供电系统的群智能优化控制模型及策略研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员