Monte Carlo Tree Search (MCTS) is a sampling best-first method to search for optimal decisions. The success of MCTS depends heavily on how the MCTS statistical tree is built and the selection policy plays a fundamental role in this. A particular selection policy that works particularly well, widely adopted in MCTS, is the Upper Confidence Bounds for Trees, referred to as UCT. Other more sophisticated bounds have been proposed by the community with the goal to improve MCTS performance on particular problems. Thus, it is evident that while the MCTS UCT behaves generally well, some variants might behave better. As a result of this, multiple works have been proposed to evolve a selection policy to be used in MCTS. Although all these works are inspiring, none of them have carried out an in-depth analysis shedding light under what circumstances an evolved alternative of MCTS UCT might be beneficial in MCTS due to focusing on a single type of problem. In sharp contrast to this, in this work we use five functions of different nature, going from a unimodal function, covering multimodal functions to deceptive functions. We demonstrate how the evolution of the MCTS UCT might be beneficial in multimodal and deceptive scenarios, whereas the MCTS UCT is robust in unimodal scenarios and competitive in the rest of the scenarios used in this study.
翻译:蒙特卡洛树搜索(MCTS)是寻求最佳决定的最先抽样的最佳方法。MCTS的成功主要取决于MCTS统计树的构建方式和选择政策在这方面发挥的根本作用。一个在MCTS中广泛采用的特殊选择政策是树的高度信任圈,称为UCT。社区提出了其他更复杂的界限,目的是改善MCTS在特定问题上的表现。因此,很明显,虽然MCTS UCT通常表现良好,但有些变异可能表现得更好。因此,提出了多项工作,以制定供MCTS使用的选择政策。尽管所有这些工作都具有启发性,但没有一项对树进行深入分析,说明在什么情况下,MCTSUCT的演变可能有利于MTS的演变,因为集中处理单一类型的问题。与此形成鲜明对照的是,我们在此工作中使用了五种不同性质的功能,从非模范CTCT函数,涵盖多式联运功能,到迷惑性功能。尽管所有这些工作都是令人鼓舞的,但在MCTMS的设想中,我们知道在不可靠的模式中如何演进。