AlphaZero, using a combination of Deep Neural Networks and Monte Carlo Tree Search (MCTS), has successfully trained reinforcement learning agents in a tabula-rasa way. The neural MCTS algorithm has been successful in finding near-optimal strategies for games through self-play. However, the AlphaZero algorithm has a significant drawback; it takes a long time to converge and requires high computational power due to complex neural networks for solving games like Chess, Go, Shogi, etc. Owing to this, it is very difficult to pursue neural MCTS research without cutting-edge hardware, which is a roadblock for many aspiring neural MCTS researchers. In this paper, we propose a new neural MCTS algorithm, called Dual MCTS, which helps overcome these drawbacks. Dual MCTS uses two different search trees, a single deep neural network, and a new update technique for the search trees using a combination of the PUCB, a sliding-window, and the epsilon-greedy algorithm. This technique is applicable to any MCTS based algorithm to reduce the number of updates to the tree. We show that Dual MCTS performs better than one of the most widely used neural MCTS algorithms, AlphaZero, for various symmetric and asymmetric games.
翻译:利用深神经网络和蒙特卡洛树搜索(MCTS)的组合,阿尔法泽罗利用深神经网络和蒙特卡洛树搜索(MCTS)成功地培训了强化学习剂。神经MCT算法成功地通过自玩找到了近于最佳的游戏策略。然而,阿尔法泽罗算法有一个很大的缺点;由于复杂的神经网络(如Ches、Go、Shogi等)的组合,需要很长的时间才能聚集并需要很高的计算能力来解决游戏。由于这个原因,很难在没有尖端硬件的情况下进行神经MCTS研究,这是许多有抱负的神经MCTS研究人员的障碍。在本文件中,我们提出了一个新的神经MCTS算法,称为双重MCTS,帮助克服这些缺陷。双重MCTS使用两种不同的搜索树,一个单一的深神经网络,以及一个新的搜索树的更新技术,使用PUCB、滑风、和epslon-greed 算法。这个技术适用于任何基于MCTS的最有活力的神经MTS 算法,我们用来更好地展示各种两极的DNA的最新数字。