Monte-Carlo Tree Search (MCTS) is an adversarial search paradigm that first found prominence with its success in the domain of computer Go. Early theoretical work established the game-theoretic soundness and convergence bounds for Upper Confidence bounds applied to Trees (UCT), the most popular instantiation of MCTS; however, there remain notable gaps in our understanding of how UCT behaves in practice. In this work, we address one such gap by considering the question of whether UCT can exhibit lookahead pathology -- a paradoxical phenomenon first observed in Minimax search where greater search effort leads to worse decision-making. We introduce a novel family of synthetic games that offer rich modeling possibilities while remaining amenable to mathematical analysis. Our theoretical and experimental results suggest that UCT is indeed susceptible to pathological behavior in a range of games drawn from this family.
翻译:Monte-Carlo 树搜索(MCTS)是一个对抗性搜索模式,在计算机Go领域取得了成功,首先发现它是一个突出的对抗性搜索模式。早期的理论工作确立了对树(UCT)应用的高度信任界限的游戏理论性稳健性和趋同性界限,这是MCTS最受欢迎的即时反应;然而,在我们对UCT在实践中如何表现的理解方面,仍然存在显著差距。在这项工作中,我们通过考虑UCT能否展示外表病理学的问题来解决这种差距。在Minimax的搜索中首次观察到一种自相矛盾的现象,在Minimax的搜索中,更大的搜索努力导致更糟糕的决策。我们引入了一个合成游戏的新式组合,它提供了丰富的建模可能性,同时仍然可以进行数学分析。我们的理论和实验结果表明,从这个家庭抽出的游戏中,UCT确实容易出现病理行为。