We present an extension of Monte Carlo Tree Search (MCTS) that strongly increases its efficiency for trees with asymmetry and/or loops. Asymmetric termination of search trees introduces a type of uncertainty for which the standard upper confidence bound (UCB) formula does not account. Our first algorithm (MCTS-T), which assumes a non-stochastic environment, backs-up tree structure uncertainty and leverages it for exploration in a modified UCB formula. Results show vastly improved efficiency in a well-known asymmetric domain in which MCTS performs arbitrarily bad. Next, we connect the ideas about asymmetric termination to the presence of loops in the tree, where the same state appears multiple times in a single trace. An extension to our algorithm (MCTS-T+), which in addition to non-stochasticity assumes full state observability, further increases search efficiency for domains with loops as well. Benchmark testing on a set of OpenAI Gym and Atari 2600 games indicates that our algorithms always perform better than or at least equivalent to standard MCTS, and could be first-choice tree search algorithms for non-stochastic, fully-observable environments.
翻译:我们展示了蒙特卡洛树搜索(MCTS)的延伸,它大大提高了对不对称和/或环状树木的效率。对搜索树进行非对称的终止带来了一种不确定性,标准上层信任约束(UB)公式对此没有说明。我们的第一个算法(MCTS-T)假设一种非随机环境,树结构的后向性不确定性,并用修改的UCB公式来利用它进行勘探。结果显示,在一个众所周知的不对称域里,MCTS表现异常差强人意。接下来,我们将关于不对称终止的想法与树圈的存在联系起来,而同一状态在一丝痕迹中出现多次。我们的算法(MCTS-T+)的扩展,除了非随机性假设完全可观察性外,还包括完全可观察性,进一步提高环域的搜索效率。 OpenAI Gym 和 Atarri 2600 游戏的基准测试表明,我们的算法总是比标准的 MCTS(MTS)更好或至少相等,并且可以成为非观测环境的首选树搜索算法。