In many complex sequential decision making tasks, online planning is crucial for high-performance. For efficient online planning, Monte Carlo Tree Search (MCTS) employs a principled mechanism for trading off between exploration and exploitation. MCTS outperforms comparison methods in various discrete decision making domains such as Go, Chess, and Shogi. Following, extensions of MCTS to continuous domains have been proposed. However, the inherent high branching factor and the resulting explosion of search tree size is limiting existing methods. To solve this problem, this paper proposes Continuous Monte Carlo Graph Search (CMCGS), a novel extension of MCTS to online planning in environments with continuous state and action spaces. CMCGS takes advantage of the insight that, during planning, sharing the same action policy between several states can yield high performance. To implement this idea, at each time step CMCGS clusters similar states into a limited number of stochastic action bandit nodes, which produce a layered graph instead of an MCTS search tree. Experimental evaluation with limited sample budgets shows that CMCGS outperforms comparison methods in several complex continuous DeepMind Control Suite benchmarks and a 2D navigation task.
翻译:在许多复杂的顺序决策任务中,在线规划对于高绩效至关重要。为了高效的在线规划,蒙特卡洛树搜索(MCTS)采用了一种原则性机制来交换勘探和开发之间的交易。 MCTS在诸如戈、切斯和肖吉等不同决策领域比得上比较方法。随后,提出了将MCTS扩大到连续领域的建议。然而,固有的高分流系数和由此引起的搜索树面积爆炸限制了现有方法。为了解决这一问题,本文件提议继续蒙特卡洛图搜索(CMCGS),这是MCTS在持续状态和行动空间的环境中对在线规划的新扩展。 CMCS利用了这样的洞察见,即在规划期间,在几个国家之间共享相同的行动政策可以产生高绩效。为了实施这一想法,每一步,CMCGS集群都将类似的州集群扩大到数量有限的分级动作节点,产生一层分层图,而不是MCTS搜索树。以有限的样本预算进行的实验性评估表明, CMCGSSS在几个复杂的连续深海控制套件基准和2D导航任务中都比得上比较方法。