游戏玩家 (Player of Games) - 专知论文

会员服务 ·

0

INFORMS · Performer · 不完美信息 · Self-Play · state-of-the-art ·

2021 年 12 月 6 日

Player of Games

翻译：游戏玩家

Martin Schmid,Matej Moravcik,Neil Burch,Rudolf Kadlec,Josh Davidson,Kevin Waugh,Nolan Bard,Finbarr Timbers,Marc Lanctot,Zach Holland,Elnaz Davoodi,Alden Christianson,Michael Bowling

Games have a long history of serving as a benchmark for progress in artificial intelligence. Recently, approaches using search and learning have shown strong performance across a set of perfect information games, and approaches using game-theoretic reasoning and learning have shown strong performance for specific imperfect information poker variants. We introduce Player of Games, a general-purpose algorithm that unifies previous approaches, combining guided search, self-play learning, and game-theoretic reasoning. Player of Games is the first algorithm to achieve strong empirical performance in large perfect and imperfect information games -- an important step towards truly general algorithms for arbitrary environments. We prove that Player of Games is sound, converging to perfect play as available computation time and approximation capacity increases. Player of Games reaches strong performance in chess and Go, beats the strongest openly available agent in heads-up no-limit Texas hold'em poker (Slumbot), and defeats the state-of-the-art agent in Scotland Yard, an imperfect information game that illustrates the value of guided search, learning, and game-theoretic reasoning.

翻译：游戏具有长期历史,可以作为人造智能进步的基准。最近,使用搜索和学习的方法在一系列完美的信息游戏中表现出很强的性能,而使用游戏理论推理和学习的方法则显示出具体不完善的信息扑克变体的强性。我们引入了游戏玩家,这是一种通用算法,它统一了以往的方法,结合了引导搜索、自玩学习和游戏理论推理。游戏玩家是在大型完美和不完善的信息游戏中取得强力实绩的第一个算法,这是向任意环境的真正通用算法迈出的重要一步。我们证明游戏玩家是健全的,随着计算时间和近似能力的增加,我们逐渐趋于完美。游戏玩家在棋局和棋盘中表现得非常出色,击败了德克萨斯无限制握牌(Slumbot)中最强大的公开代理,击败了苏格兰游戏场中最先进的代理,这是一个不完善的信息游戏,展示了引导搜索、学习和游戏理论推理的价值。

0

相关内容

INFORMS

《计算机信息》杂志发表高质量的论文，扩大了运筹学和计算的范围，寻求有关理论、方法、实验、系统和应用方面的原创研究论文、新颖的调查和教程论文，以及描述新的和有用的软件工具的论文。官网链接：https://pubsonline.informs.org/journal/ijoc

【硬核课】机器人学习课程，UT Austin朱玉可博士讲述自主机器人的人工智能与机器学习机器学习算法

【硬核课】机器人学习课程，UT Austin朱玉可博士讲述自主机器人的人工智能与机器学习机器学习算法

专知会员服务

40+阅读 · 2020年9月21日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

深度学习搜索，Exploring Deep Learning for Search

深度学习搜索，Exploring Deep Learning for Search

专知会员服务

61+阅读 · 2020年5月9日

Capsule Networks，胶囊网络，57页ppt，布法罗大学

Capsule Networks，胶囊网络，57页ppt，布法罗大学

专知会员服务

69+阅读 · 2020年2月29日

【综述】自动驾驶领域中的强化学习，附18页论文下载

【综述】自动驾驶领域中的强化学习，附18页论文下载

专知会员服务

176+阅读 · 2020年2月8日

【Facebook|AAAI2020】在合作的部分可观察博弈中通过搜索改进策略（Improving Policies via Search in Cooperative Partially Observable Games）

【Facebook|AAAI2020】在合作的部分可观察博弈中通过搜索改进策略（Improving Policies via Search in Cooperative Partially Observable Games）

专知会员服务

16+阅读 · 2019年12月10日

【电子书推荐】强化学习（Reinforcement Learning）法兰克福大学 | Cornelius Weber

【电子书推荐】强化学习（Reinforcement Learning）法兰克福大学 | Cornelius Weber

专知会员服务

45+阅读 · 2019年11月19日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

GAN新书《生成式深度学习》，Generative Deep Learning，379页pdf

GAN新书《生成式深度学习》，Generative Deep Learning，379页pdf

专知会员服务

208+阅读 · 2019年9月30日

【ICML 2019 Tutorials】算法配置：算法设计空间中的学习（Algorithm configuration: learning in the space of algorithm designs），德国弗莱堡大学（University of Freiburg）教授| Frank Hutter，不列颠哥伦比亚大学 (University of British Columbia)| Kevin Leyton Brown

【ICML 2019 Tutorials】算法配置：算法设计空间中的学习（Algorithm configuration: learning in the space of algorithm designs），德国弗莱堡大学（University of Freiburg）教授| Frank Hutter，不列颠哥伦比亚大学 (University of British Columbia)| Kevin Leyton Brown

专知会员服务

8+阅读 · 2019年6月10日

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

LibRec 精选：基于LSTM的序列推荐实现（PyTorch）

LibRec 精选：基于LSTM的序列推荐实现（PyTorch）

LibRec智能推荐

50+阅读 · 2018年8月27日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

No-Regret Learning in Dynamic Stackelberg Games

Arxiv

0+阅读 · 2022年2月10日

Independent Policy Gradient for Large-Scale Markov Potential Games: Sharper Rates, Function Approximation, and Game-Agnostic Convergence

Arxiv

0+阅读 · 2022年2月8日

Value Functions for Depth-Limited Solving in Zero-Sum Imperfect-Information Games

Arxiv

0+阅读 · 2022年2月8日

A Ranking Game for Imitation Learning

Arxiv

0+阅读 · 2022年2月7日

DouZero: Mastering DouDizhu with Self-Play Deep Reinforcement Learning

Arxiv

5+阅读 · 2021年6月11日

Modelling Behavioural Diversity for Learning in Open-Ended Games

Arxiv

11+阅读 · 2021年3月14日

Pipeline PSRO: A Scalable Approach for Finding Approximate Nash Equilibria in Large Games

Arxiv

3+阅读 · 2020年6月15日

Multi-task Deep Reinforcement Learning with PopArt

Multi-task Deep Reinforcement Learning with PopArt

Arxiv

4+阅读 · 2018年9月12日

Sim-to-Real Optimization of Complex Real World Mobile Network with Imperfect Information via Deep Reinforcement Learning from Self-play

Arxiv

4+阅读 · 2018年4月17日

Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments

Arxiv

6+阅读 · 2018年1月16日

VIP会员

文章信息

相关主题

不完美信息

state-of-the-art

相关VIP内容

【硬核课】机器人学习课程，UT Austin朱玉可博士讲述自主机器人的人工智能与机器学习机器学习算法

【硬核课】机器人学习课程，UT Austin朱玉可博士讲述自主机器人的人工智能与机器学习机器学习算法

专知会员服务

40+阅读 · 2020年9月21日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

深度学习搜索，Exploring Deep Learning for Search

深度学习搜索，Exploring Deep Learning for Search

专知会员服务

61+阅读 · 2020年5月9日

Capsule Networks，胶囊网络，57页ppt，布法罗大学

Capsule Networks，胶囊网络，57页ppt，布法罗大学

专知会员服务

69+阅读 · 2020年2月29日

【综述】自动驾驶领域中的强化学习，附18页论文下载

【综述】自动驾驶领域中的强化学习，附18页论文下载

专知会员服务

176+阅读 · 2020年2月8日

【Facebook|AAAI2020】在合作的部分可观察博弈中通过搜索改进策略（Improving Policies via Search in Cooperative Partially Observable Games）

【Facebook|AAAI2020】在合作的部分可观察博弈中通过搜索改进策略（Improving Policies via Search in Cooperative Partially Observable Games）

专知会员服务

16+阅读 · 2019年12月10日

【电子书推荐】强化学习（Reinforcement Learning）法兰克福大学 | Cornelius Weber

【电子书推荐】强化学习（Reinforcement Learning）法兰克福大学 | Cornelius Weber

专知会员服务

45+阅读 · 2019年11月19日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

GAN新书《生成式深度学习》，Generative Deep Learning，379页pdf

GAN新书《生成式深度学习》，Generative Deep Learning，379页pdf

专知会员服务

208+阅读 · 2019年9月30日

【ICML 2019 Tutorials】算法配置：算法设计空间中的学习（Algorithm configuration: learning in the space of algorithm designs），德国弗莱堡大学（University of Freiburg）教授| Frank Hutter，不列颠哥伦比亚大学 (University of British Columbia)| Kevin Leyton Brown

【ICML 2019 Tutorials】算法配置：算法设计空间中的学习（Algorithm configuration: learning in the space of algorithm designs），德国弗莱堡大学（University of Freiburg）教授| Frank Hutter，不列颠哥伦比亚大学 (University of British Columbia)| Kevin Leyton Brown

专知会员服务

8+阅读 · 2019年6月10日

热门VIP内容

开通专知VIP会员享更多权益服务

视觉-语言-动作模型解析：从模块构成到里程碑与挑战

《解析陆域作战方向：一个概念性框架》报告

【博士论文】基于多模态基础模型的上下文学习

追寻真正的AI自主性：从遗留思维到战场优势

相关资讯

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

LibRec 精选：基于LSTM的序列推荐实现（PyTorch）

LibRec 精选：基于LSTM的序列推荐实现（PyTorch）

LibRec智能推荐

50+阅读 · 2018年8月27日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

No-Regret Learning in Dynamic Stackelberg Games

Arxiv

0+阅读 · 2022年2月10日

Independent Policy Gradient for Large-Scale Markov Potential Games: Sharper Rates, Function Approximation, and Game-Agnostic Convergence

Arxiv

0+阅读 · 2022年2月8日

Value Functions for Depth-Limited Solving in Zero-Sum Imperfect-Information Games

Arxiv

0+阅读 · 2022年2月8日

A Ranking Game for Imitation Learning

Arxiv

0+阅读 · 2022年2月7日

DouZero: Mastering DouDizhu with Self-Play Deep Reinforcement Learning

Arxiv

5+阅读 · 2021年6月11日

Modelling Behavioural Diversity for Learning in Open-Ended Games

Arxiv

11+阅读 · 2021年3月14日

Pipeline PSRO: A Scalable Approach for Finding Approximate Nash Equilibria in Large Games

Arxiv

3+阅读 · 2020年6月15日

Multi-task Deep Reinforcement Learning with PopArt

Multi-task Deep Reinforcement Learning with PopArt

Arxiv

4+阅读 · 2018年9月12日

Sim-to-Real Optimization of Complex Real World Mobile Network with Imperfect Information via Deep Reinforcement Learning from Self-play

Arxiv

4+阅读 · 2018年4月17日

Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments

Arxiv

6+阅读 · 2018年1月16日

微信扫码咨询专知VIP会员