在连续行动空间内为蒙特卡洛规划制定有限的以深度土匪为基础的持续行动空间规划战略 (Limited depth bandit-based strategy for Monte Carlo planning in continuous action spaces)

This paper addresses the problem of optimal control using search trees. We start by considering multi-armed bandit problems with continuous action spaces and propose LD-HOO, a limited depth variant of the hierarchical optimistic optimization (HOO) algorithm. We provide a regret analysis for LD-HOO and show that, asymptotically, our algorithm exhibits the same cumulative regret as the original HOO while being faster and more memory efficient. We then propose a Monte Carlo tree search algorithm based on LD-HOO for optimal control problems and illustrate the resulting approach's application in several optimal control problems.

翻译：本文探讨使用搜索树进行最佳控制的问题。我们首先考虑多武装强盗问题, 包括连续行动空间, 并提出LD- HOO, 这是等级乐观优化(HOO)算法的有限深度变量。我们为LD- HOO提供了遗憾分析, 并表明我们的算法与原HOO一样, 累积了同样的遗憾, 同时速度更快, 记忆效率更高。然后我们提出一个基于LD- HOO的蒙特卡洛树搜索算法, 以优化控制问题, 并演示由此产生的方法在若干最佳控制问题中的应用。

相关内容

Continuity

关注 4

让 iOS 8 和 OS X Yosemite 无缝切换的一个新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source: Apple - iOS 8

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

54+阅读 · 2020年9月7日

【Manning新书】现代Java实战，592页pdf

专知会员服务

101+阅读 · 2020年5月22日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

112+阅读 · 2020年5月15日