用 KL 注册搜索模拟强力和人与人之间的游戏游戏 (Modeling Strong and Human-Like Gameplay with KL-Regularized Search) - 专知论文

会员服务 ·

0

预测准确率 · 正则化项 · 散度 · 模型评估 · 学成 ·

2021 年 12 月 14 日

Modeling Strong and Human-Like Gameplay with KL-Regularized Search

翻译：用 KL 注册搜索模拟强力和人与人之间的游戏游戏

Athul Paul Jacob,David J. Wu,Gabriele Farina,Adam Lerer,Anton Bakhtin,Jacob Andreas,Noam Brown

We consider the task of building strong but human-like policies in multi-agent decision-making problems, given examples of human behavior. Imitation learning is effective at predicting human actions but may not match the strength of expert humans, while self-play learning and search techniques (e.g. AlphaZero) lead to strong performance but may produce policies that are difficult for humans to understand and coordinate with. We show in chess and Go that regularizing search policies based on the KL divergence from an imitation-learned policy by applying Monte Carlo tree search produces policies that have higher human prediction accuracy and are stronger than the imitation policy. We then introduce a novel regret minimization algorithm that is regularized based on the KL divergence from an imitation-learned policy, and show that applying this algorithm to no-press Diplomacy yields a policy that maintains the same human prediction accuracy as imitation learning while being substantially stronger.

翻译：我们认为,在多媒介决策问题上,在人类行为的例子中,在多媒介决策问题上,制定强有力但人性化的政策是一项任务。模拟学习在预测人类行动方面是有效的,但可能与专家人类的力量不相称,而自我游戏学习和搜索技术(如阿尔法泽罗)则带来强大的表现,但可能产生对人类来说难以理解和协调的政策。我们在象棋和象棋中显示,应用蒙特卡洛树搜索方法,根据KL与模仿学习政策的差别,使搜索政策正规化,产生比模仿政策更准确、更强的政策。然后,我们引入一种基于KL与模仿学习政策差异的新型遗憾最小化算法,并表明将这种算法用于不鼓励外交的政策产生一种与模仿学习一样的人性预测准确性的政策,而这种政策在相当强大的情况下保持了模仿学习。

0

相关内容

预测准确率

预测准确率

【2022新书】强化学习工业应用，408页pdf

【2022新书】强化学习工业应用，408页pdf

专知会员服务

219+阅读 · 2022年2月3日

【普林斯顿干货书】强化学习与随机优化，728页pdf阐述序列决策统一框架

【普林斯顿干货书】强化学习与随机优化，728页pdf阐述序列决策统一框架

专知会员服务

126+阅读 · 2021年4月25日

【如何做研究】How to research ，22页ppt

【如何做研究】How to research ，22页ppt

专知会员服务

108+阅读 · 2021年4月17日

NLP必读经典文献100篇

专知会员服务

123+阅读 · 2020年9月8日

低秩稀疏矩阵优化问题的模型与算法

专知会员服务

41+阅读 · 2020年7月29日

波士顿大学Francesco《在线学习导论》2020书册，126页pdf详述在线学习最新进展

波士顿大学Francesco《在线学习导论》2020书册，126页pdf详述在线学习最新进展

专知会员服务

56+阅读 · 2020年5月13日

【UMD开放书】机器学习课程书册，19章227页pdf，带你学习ML

【UMD开放书】机器学习课程书册，19章227页pdf，带你学习ML

专知会员服务

99+阅读 · 2019年12月9日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

30+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

167+阅读 · 2019年10月11日

MIT新书《强化学习与最优控制》

MIT新书《强化学习与最优控制》

专知会员服务

270+阅读 · 2019年10月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

23+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

25+阅读 · 2019年5月18日

动物脑的好奇心和强化学习的好奇心

动物脑的好奇心和强化学习的好奇心

CreateAMind

10+阅读 · 2019年1月26日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

15+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

17+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

41+阅读 · 2019年1月3日

spinningup.openai 强化学习资源完整

spinningup.openai 强化学习资源完整

CreateAMind

6+阅读 · 2018年12月17日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Reinforcement Learning: An Introduction 2018第二版 500页

Reinforcement Learning: An Introduction 2018第二版 500页

CreateAMind

11+阅读 · 2018年4月27日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

Efficient Policy Space Response Oracles

Arxiv

0+阅读 · 2022年2月17日

Modeling Strong Physically Unclonable Functions with Metaheuristics

Arxiv

0+阅读 · 2022年2月16日

Learning Reward Models for Cooperative Trajectory Planning with Inverse Reinforcement Learning and Monte Carlo Tree Search

Arxiv

0+阅读 · 2022年2月16日

AI Chains: Transparent and Controllable Human-AI Interaction by Chaining Large Language Model Prompts

Arxiv

0+阅读 · 2022年2月15日

One Step at a Time: Long-Horizon Vision-and-Language Navigation with Milestones

Arxiv

0+阅读 · 2022年2月14日

Efficient Continuous Control with Double Actors and Regularized Critics

Arxiv

6+阅读 · 2021年6月6日

Path Planning using Neural A* Search

Arxiv

4+阅读 · 2021年2月8日

At Human Speed: Deep Reinforcement Learning with Action Delay

Arxiv

4+阅读 · 2018年10月16日

Structural Consistency and Controllability for Diverse Colorization

Structural Consistency and Controllability for Diverse Colorization

Arxiv

7+阅读 · 2018年9月6日

Learning Inverse Mappings with Adversarial Criterion

Arxiv

3+阅读 · 2018年3月21日

VIP会员

文章信息

相关主题

预测准确率

相关VIP内容

【2022新书】强化学习工业应用，408页pdf

【2022新书】强化学习工业应用，408页pdf

专知会员服务

219+阅读 · 2022年2月3日

【普林斯顿干货书】强化学习与随机优化，728页pdf阐述序列决策统一框架

【普林斯顿干货书】强化学习与随机优化，728页pdf阐述序列决策统一框架

专知会员服务

126+阅读 · 2021年4月25日

【如何做研究】How to research ，22页ppt

【如何做研究】How to research ，22页ppt

专知会员服务

108+阅读 · 2021年4月17日

NLP必读经典文献100篇

专知会员服务

123+阅读 · 2020年9月8日

低秩稀疏矩阵优化问题的模型与算法

专知会员服务

41+阅读 · 2020年7月29日

波士顿大学Francesco《在线学习导论》2020书册，126页pdf详述在线学习最新进展

波士顿大学Francesco《在线学习导论》2020书册，126页pdf详述在线学习最新进展

专知会员服务

56+阅读 · 2020年5月13日

【UMD开放书】机器学习课程书册，19章227页pdf，带你学习ML

【UMD开放书】机器学习课程书册，19章227页pdf，带你学习ML

专知会员服务

99+阅读 · 2019年12月9日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

30+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

167+阅读 · 2019年10月11日

MIT新书《强化学习与最优控制》

MIT新书《强化学习与最优控制》

专知会员服务

270+阅读 · 2019年10月9日

热门VIP内容

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

23+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

25+阅读 · 2019年5月18日

动物脑的好奇心和强化学习的好奇心

动物脑的好奇心和强化学习的好奇心

CreateAMind

10+阅读 · 2019年1月26日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

15+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

17+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

41+阅读 · 2019年1月3日

spinningup.openai 强化学习资源完整

spinningup.openai 强化学习资源完整

CreateAMind

6+阅读 · 2018年12月17日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Reinforcement Learning: An Introduction 2018第二版 500页

Reinforcement Learning: An Introduction 2018第二版 500页

CreateAMind

11+阅读 · 2018年4月27日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Efficient Policy Space Response Oracles

Arxiv

0+阅读 · 2022年2月17日

Modeling Strong Physically Unclonable Functions with Metaheuristics

Arxiv

0+阅读 · 2022年2月16日

Learning Reward Models for Cooperative Trajectory Planning with Inverse Reinforcement Learning and Monte Carlo Tree Search

Arxiv

0+阅读 · 2022年2月16日

AI Chains: Transparent and Controllable Human-AI Interaction by Chaining Large Language Model Prompts

Arxiv

0+阅读 · 2022年2月15日

One Step at a Time: Long-Horizon Vision-and-Language Navigation with Milestones

Arxiv

0+阅读 · 2022年2月14日

Efficient Continuous Control with Double Actors and Regularized Critics

Arxiv

6+阅读 · 2021年6月6日

Path Planning using Neural A* Search

Arxiv

4+阅读 · 2021年2月8日

At Human Speed: Deep Reinforcement Learning with Action Delay

Arxiv

4+阅读 · 2018年10月16日

Structural Consistency and Controllability for Diverse Colorization

Structural Consistency and Controllability for Diverse Colorization

Arxiv

7+阅读 · 2018年9月6日

Learning Inverse Mappings with Adversarial Criterion

Arxiv

3+阅读 · 2018年3月21日

微信扫码咨询专知VIP会员