In recent years, Monte Carlo tree search (MCTS) has achieved widespread adoption within the game community. Its use in conjunction with deep reinforcement learning has produced success stories in many applications. While these approaches have been implemented in various games, from simple board games to more complicated video games such as StarCraft, the use of deep neural networks requires a substantial training period. In this work, we explore on-line adaptivity in MCTS without requiring pre-training. We present MCTS-TD, an adaptive MCTS algorithm improved with temporal difference learning. We demonstrate our new approach on the game miniXCOM, a simplified version of XCOM, a popular commercial franchise consisting of several turn-based tactical games, and show how adaptivity in MCTS-TD allows for improved performances against opponents.
翻译:近年来,蒙特卡洛树搜索(MCTS)在游戏界已得到广泛采用,在深入强化学习的同时,在很多应用中产生了成功的故事。虽然这些方法已在各种游戏中实施,从简单的棋盘游戏到更复杂的游戏游戏(如StarCraft),但深神经网络的使用需要相当长的训练期。在这项工作中,我们在不需要培训前就探索MCTS的在线适应性。我们介绍了MCTS-TD,这是一种适应性的MCTS算法,随着时间差异的学习而得到改善。我们展示了我们在游戏迷你XCOM上的新做法,一个简化版的 XCOM,一个由若干转手战术游戏组成的流行商业独家经营权,并展示了MCTS-TD的适应性如何改善反对者的表现。