This paper addresses the problem of optimal control using search trees. We start by considering multi-armed bandit problems with continuous action spaces and propose LD-HOO, a limited depth variant of the hierarchical optimistic optimization (HOO) algorithm. We provide a regret analysis for LD-HOO and show that, asymptotically, our algorithm exhibits the same cumulative regret as the original HOO while being faster and more memory efficient. We then propose a Monte Carlo tree search algorithm based on LD-HOO for optimal control problems and illustrate the resulting approach's application in several optimal control problems.
翻译:本文探讨使用搜索树进行最佳控制的问题。 我们首先考虑多武装强盗问题, 包括连续行动空间, 并提出LD- HOO, 这是等级乐观优化(HOO)算法的有限深度变量。 我们为LD- HOO提供了遗憾分析, 并表明我们的算法与原HOO一样, 累积了同样的遗憾, 同时速度更快, 记忆效率更高 。 然后我们提出一个基于LD- HOO的蒙特卡洛树搜索算法, 以优化控制问题, 并演示由此产生的方法在若干最佳控制问题中的应用 。