双蒙特卡洛树搜索 (Dual Monte Carlo Tree Search) - 专知论文

会员服务 ·

0

蒙特卡洛树搜索 · 蒙特卡罗 · AlphaZero · Neural Networks · Networking ·

2021 年 10 月 10 日

Dual Monte Carlo Tree Search

翻译：双蒙特卡洛树搜索

Prashank Kadam,Ruiyang Xu,Karl Lieberherr

from arxiv, 8 pages, 4 figures

AlphaZero, using a combination of Deep Neural Networks and Monte Carlo Tree Search (MCTS), has successfully trained reinforcement learning agents in a tabula-rasa way. The neural MCTS algorithm has been successful in finding near-optimal strategies for games through self-play. However, the AlphaZero algorithm has a significant drawback; it takes a long time to converge and requires high computational power due to complex neural networks for solving games like Chess, Go, Shogi, etc. Owing to this, it is very difficult to pursue neural MCTS research without cutting-edge hardware, which is a roadblock for many aspiring neural MCTS researchers. In this paper, we propose a new neural MCTS algorithm, called Dual MCTS, which helps overcome these drawbacks. Dual MCTS uses two different search trees, a single deep neural network, and a new update technique for the search trees using a combination of the PUCB, a sliding-window, and the epsilon-greedy algorithm. This technique is applicable to any MCTS based algorithm to reduce the number of updates to the tree. We show that Dual MCTS performs better than one of the most widely used neural MCTS algorithms, AlphaZero, for various symmetric and asymmetric games.

翻译：利用深神经网络和蒙特卡洛树搜索(MCTS)的组合,阿尔法泽罗利用深神经网络和蒙特卡洛树搜索(MCTS)成功地培训了强化学习剂。神经MCT算法成功地通过自玩找到了近于最佳的游戏策略。然而,阿尔法泽罗算法有一个很大的缺点;由于复杂的神经网络(如Ches、Go、Shogi等)的组合,需要很长的时间才能聚集并需要很高的计算能力来解决游戏。由于这个原因,很难在没有尖端硬件的情况下进行神经MCTS研究,这是许多有抱负的神经MCTS研究人员的障碍。在本文件中,我们提出了一个新的神经MCTS算法,称为双重MCTS,帮助克服这些缺陷。双重MCTS使用两种不同的搜索树,一个单一的深神经网络,以及一个新的搜索树的更新技术,使用PUCB、滑风、和epslon-greed 算法。这个技术适用于任何基于MCTS的最有活力的神经MTS 算法,我们用来更好地展示各种两极的DNA的最新数字。

0

相关内容

蒙特卡洛树搜索

蒙特卡洛树搜索

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

111+阅读 · 2020年5月15日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

在线变分推断，76页ppt，A Regret Bound for Online Variational Inference

在线变分推断，76页ppt，A Regret Bound for Online Variational Inference

专知会员服务

21+阅读 · 2019年12月2日

【Google】神经架构搜索（Neural Architecture Search and Beyond），Barret Zoph

【Google】神经架构搜索（Neural Architecture Search and Beyond），Barret Zoph

专知会员服务

31+阅读 · 2019年11月25日

【清华大学】自动微分蒙特卡洛，理论与应用，Automatic Differentiable Monte Carlo: Theory and Application (附pdf）

专知会员服务

28+阅读 · 2019年11月23日

基于图的word2vec负采样( GNEG:Graph-Based Negative Sampling for word2vec)

基于图的word2vec负采样( GNEG:Graph-Based Negative Sampling for word2vec)

专知会员服务

40+阅读 · 2019年11月23日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

分布式并行架构Ray介绍

分布式并行架构Ray介绍

CreateAMind

10+阅读 · 2019年8月9日

已删除

将门创投

4+阅读 · 2019年6月5日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

Pre-trained Language Model based Ranking in Baidu Search

Arxiv

9+阅读 · 2021年6月25日

Neural Architecture Search without Training

Neural Architecture Search without Training

Arxiv

10+阅读 · 2021年6月11日

Path Planning using Neural A* Search

Arxiv

5+阅读 · 2021年2月8日

Fitting the Search Space of Weight-sharing NAS with Graph Convolutional Networks

Arxiv

7+阅读 · 2020年12月15日

Retro*: Learning Retrosynthetic Planning with Neural Guided A* Search

Arxiv

9+阅读 · 2020年6月29日

Causal Discovery with Reinforcement Learning

Arxiv

4+阅读 · 2020年3月19日

Deep Learning for Energy Markets

Deep Learning for Energy Markets

Arxiv

10+阅读 · 2019年4月10日

Risk-Aware Active Inverse Reinforcement Learning

Risk-Aware Active Inverse Reinforcement Learning

Arxiv

8+阅读 · 2019年1月8日

MnasNet: Platform-Aware Neural Architecture Search for Mobile

Arxiv

4+阅读 · 2018年7月31日

Logically-Constrained Reinforcement Learning

Arxiv

5+阅读 · 2018年4月22日

VIP会员

文章信息

相关主题

蒙特卡洛树搜索

Neural Networks

相关VIP内容

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

111+阅读 · 2020年5月15日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

在线变分推断，76页ppt，A Regret Bound for Online Variational Inference

在线变分推断，76页ppt，A Regret Bound for Online Variational Inference

专知会员服务

21+阅读 · 2019年12月2日

【Google】神经架构搜索（Neural Architecture Search and Beyond），Barret Zoph

【Google】神经架构搜索（Neural Architecture Search and Beyond），Barret Zoph

专知会员服务

31+阅读 · 2019年11月25日

【清华大学】自动微分蒙特卡洛，理论与应用，Automatic Differentiable Monte Carlo: Theory and Application (附pdf）

专知会员服务

28+阅读 · 2019年11月23日

基于图的word2vec负采样( GNEG:Graph-Based Negative Sampling for word2vec)

基于图的word2vec负采样( GNEG:Graph-Based Negative Sampling for word2vec)

专知会员服务

40+阅读 · 2019年11月23日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【博士论文】扩展可扩展会话推荐的边界

别想太多：高效 R1 风格大型推理模型综述

【ACMMM2025】EvoVLMA: 进化式视觉-语言模型自适应

智能体网络：用AI智能体编织下一代网络

相关资讯

分布式并行架构Ray介绍

分布式并行架构Ray介绍

CreateAMind

10+阅读 · 2019年8月9日

已删除

将门创投

4+阅读 · 2019年6月5日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

Pre-trained Language Model based Ranking in Baidu Search

Arxiv

9+阅读 · 2021年6月25日

Neural Architecture Search without Training

Neural Architecture Search without Training

Arxiv

10+阅读 · 2021年6月11日

Path Planning using Neural A* Search

Arxiv

5+阅读 · 2021年2月8日

Fitting the Search Space of Weight-sharing NAS with Graph Convolutional Networks

Arxiv

7+阅读 · 2020年12月15日

Retro*: Learning Retrosynthetic Planning with Neural Guided A* Search

Arxiv

9+阅读 · 2020年6月29日

Causal Discovery with Reinforcement Learning

Arxiv

4+阅读 · 2020年3月19日

Deep Learning for Energy Markets

Deep Learning for Energy Markets

Arxiv

10+阅读 · 2019年4月10日

Risk-Aware Active Inverse Reinforcement Learning

Risk-Aware Active Inverse Reinforcement Learning

Arxiv

8+阅读 · 2019年1月8日

MnasNet: Platform-Aware Neural Architecture Search for Mobile

Arxiv

4+阅读 · 2018年7月31日

Logically-Constrained Reinforcement Learning

Arxiv

5+阅读 · 2018年4月22日

微信扫码咨询专知VIP会员