Regret-Minimizing Double Oracle for Extensive-Form Games (Regret-Minimizing Double Oracle for Extensive-Form Games) - 专知论文

会员服务 ·

0

样本复杂度 · 博弈 · 样本 · Oracle · 元算法 ·

2023 年 4 月 20 日

Regret-Minimizing Double Oracle for Extensive-Form Games

翻译：Regret-Minimizing Double Oracle for Extensive-Form Games

Xiaohang Tang,Le Cong Dinh,Stephen Marcus McAleer,Yaodong Yang

from arxiv, 17 pages

By incorporating regret minimization, double oracle methods have demonstrated rapid convergence to Nash Equilibrium (NE) in normal-form games and extensive-form games, through algorithms such as online double oracle (ODO) and extensive-form double oracle (XDO), respectively. In this study, we further examine the theoretical convergence rate and sample complexity of such regret minimization-based double oracle methods, utilizing a unified framework called Regret-Minimizing Double Oracle. Based on this framework, we extend ODO to extensive-form games and determine its sample complexity. Moreover, we demonstrate that the sample complexity of XDO can be exponential in the number of information sets $|S|$, owing to the exponentially decaying stopping threshold of restricted games. To solve this problem, we propose the Periodic Double Oracle (PDO) method, which has the lowest sample complexity among all existing double oracle methods, being only polynomial in $|S|$. Empirical evaluations on multiple poker and board games show that PDO achieves significantly faster convergence than previous double oracle algorithms and reaches a competitive level with state-of-the-art regret minimization methods.

翻译：通过引入遗憾最小化的思想，双正则元方法已经被证明在标准型博弈和扩展型博弈中以非常快的速度（利用算法比如在线双正则元和扩展型双正则元）收敛于纳什均衡点。在本论文中，我们进一步考察这种基于遗憾最小化的双正则元方法的理论收敛速度和样本复杂度，使用一个被称为“遗憾最小化的双正则元法”的统一框架。在此框架下，我们将在线双正则元扩展到了扩展型博弈，并研究了它的样本复杂度。此外，我们发现，由于受到限制博弈指数级递减停止阈值的影响，扩展型双正则元的样本复杂度可能是指数级的。为解决这一问题，我们提出了周期双正则元方法(PDO)，它在所有现有的双正则元方法中具有最低的样本复杂度，仅仅是$|S|$多项式级别。在多个扑克和棋类游戏上的实证评估表明，PDO比以前的双正则元算法收敛得更快，并且达到了现有基于遗憾最小化的最新算法的竞争水平。

0

相关内容

样本复杂度

样本复杂度

不可错过！杜克大学《因果推断》课程，全面讲述因果推理

不可错过！杜克大学《因果推断》课程，全面讲述因果推理

专知会员服务

51+阅读 · 2022年10月22日

南大《优化方法（Optimization Methods》课程，推荐！

南大《优化方法（Optimization Methods》课程，推荐！

专知会员服务

80+阅读 · 2022年4月3日

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

专知会员服务

131+阅读 · 2020年4月19日

元迁移学习的小样本学习，Meta-transfer Learning for Few-shot Learning

元迁移学习的小样本学习，Meta-transfer Learning for Few-shot Learning

专知会员服务

159+阅读 · 2020年2月29日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

量化金融强化学习论文集合

量化金融强化学习论文集合

专知

14+阅读 · 2019年12月18日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

受体MDSCs通过CEACAM1-TIM3调控NK细胞功能介导肝移植免疫耐受的机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

套子代数的Hochschild上同调及套的分类

国家自然科学基金

3+阅读 · 2014年12月31日

ROC1活化mTOR通路促进膀胱癌侵袭及转移的机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

Pictet–Spengler类反应机理的理论研究和新反应设计

国家自然科学基金

0+阅读 · 2013年12月31日

在线和离线折衷排序研究

国家自然科学基金

0+阅读 · 2012年12月31日

小样本空间制图

国家自然科学基金

0+阅读 · 2012年12月31日

CyPA在OX40-OX40L受体-配体轴调控动脉粥样斑块形成中的作用及机制

国家自然科学基金

0+阅读 · 2011年12月31日

Ube2g2/gp78催化泛素链合成机制的研究

国家自然科学基金

0+阅读 · 2009年12月31日

p进表示的伽罗瓦上同调

国家自然科学基金

0+阅读 · 2008年12月31日

面向功效的中药数据集市的多维分析和数据挖掘

国家自然科学基金

0+阅读 · 2008年12月31日

OptimShare: A Unified Framework for Privacy Preserving Data Sharing -- Towards the Practical Utility of Data with Privacy

Arxiv

0+阅读 · 2023年6月6日

Abstracting Imperfect Information Away from Two-Player Zero-Sum Games

Arxiv

0+阅读 · 2023年6月6日

Robust Statistical Inference for Large-dimensional Matrix-valued Time Series via Iterative Huber Regression

Arxiv

0+阅读 · 2023年6月5日

Designing Equilibria in Concurrent Games with Social Welfare and Temporal Logic Constraints

Arxiv

0+阅读 · 2023年6月5日

Roulette-Wheel Selection-Based PSO Algorithm for Solving the Vehicle Routing Problem with Time Windows

Arxiv

0+阅读 · 2023年6月4日

Deep Partial Least Squares for Instrumental Variable Regression

Arxiv

0+阅读 · 2023年6月2日

Bayes-optimal limits in structured PCA, and how to reach them

Arxiv

0+阅读 · 2023年6月2日

Convergence analysis of equilibrium methods for inverse problems

Arxiv

0+阅读 · 2023年6月2日

Robust Recursive Filtering and Smoothing

Arxiv

0+阅读 · 2023年5月31日

Rethinking Knowledge Graph Propagation for Zero-Shot Learning

Arxiv

17+阅读 · 2018年5月31日

VIP会员

文章信息

相关主题

样本复杂度

相关VIP内容

不可错过！杜克大学《因果推断》课程，全面讲述因果推理

不可错过！杜克大学《因果推断》课程，全面讲述因果推理

专知会员服务

51+阅读 · 2022年10月22日

南大《优化方法（Optimization Methods》课程，推荐！

南大《优化方法（Optimization Methods》课程，推荐！

专知会员服务

80+阅读 · 2022年4月3日

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

专知会员服务

131+阅读 · 2020年4月19日

元迁移学习的小样本学习，Meta-transfer Learning for Few-shot Learning

元迁移学习的小样本学习，Meta-transfer Learning for Few-shot Learning

专知会员服务

159+阅读 · 2020年2月29日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

《乌克兰无人机产业：志愿者与政策在构建新兴无人机产业中的协同作用》最新报告

《人工智能辅助决策中的数据可视化：系统性综述》

人工智能驱动弹药制造现代化：美国陆军转型之路

《敏捷作战部署中枢纽-辐条基地选址优化研究》80页

相关资讯

量化金融强化学习论文集合

量化金融强化学习论文集合

专知

14+阅读 · 2019年12月18日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

相关论文

OptimShare: A Unified Framework for Privacy Preserving Data Sharing -- Towards the Practical Utility of Data with Privacy

Arxiv

0+阅读 · 2023年6月6日

Abstracting Imperfect Information Away from Two-Player Zero-Sum Games

Arxiv

0+阅读 · 2023年6月6日

Robust Statistical Inference for Large-dimensional Matrix-valued Time Series via Iterative Huber Regression

Arxiv

0+阅读 · 2023年6月5日

Designing Equilibria in Concurrent Games with Social Welfare and Temporal Logic Constraints

Arxiv

0+阅读 · 2023年6月5日

Roulette-Wheel Selection-Based PSO Algorithm for Solving the Vehicle Routing Problem with Time Windows

Arxiv

0+阅读 · 2023年6月4日

Deep Partial Least Squares for Instrumental Variable Regression

Arxiv

0+阅读 · 2023年6月2日

Bayes-optimal limits in structured PCA, and how to reach them

Arxiv

0+阅读 · 2023年6月2日

Convergence analysis of equilibrium methods for inverse problems

Arxiv

0+阅读 · 2023年6月2日

Robust Recursive Filtering and Smoothing

Arxiv

0+阅读 · 2023年5月31日

Rethinking Knowledge Graph Propagation for Zero-Shot Learning

Arxiv

17+阅读 · 2018年5月31日

相关基金

受体MDSCs通过CEACAM1-TIM3调控NK细胞功能介导肝移植免疫耐受的机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

套子代数的Hochschild上同调及套的分类

国家自然科学基金

3+阅读 · 2014年12月31日

ROC1活化mTOR通路促进膀胱癌侵袭及转移的机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

Pictet–Spengler类反应机理的理论研究和新反应设计

国家自然科学基金

0+阅读 · 2013年12月31日

在线和离线折衷排序研究

国家自然科学基金

0+阅读 · 2012年12月31日

小样本空间制图

国家自然科学基金

0+阅读 · 2012年12月31日

CyPA在OX40-OX40L受体-配体轴调控动脉粥样斑块形成中的作用及机制

国家自然科学基金

0+阅读 · 2011年12月31日

Ube2g2/gp78催化泛素链合成机制的研究

国家自然科学基金

0+阅读 · 2009年12月31日

p进表示的伽罗瓦上同调

国家自然科学基金

0+阅读 · 2008年12月31日

面向功效的中药数据集市的多维分析和数据挖掘

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员