Solving Partially Observable Markov Decision Processes (POMDPs) with continuous actions is challenging, particularly for high-dimensional action spaces. To alleviate this difficulty, we propose a new sampling-based online POMDP solver, called Adaptive Discretization using Voronoi Trees (ADVT). It uses Monte Carlo Tree Search in combination with an adaptive discretization of the action space as well as optimistic optimization to efficiently sample high-dimensional continuous action spaces and compute the best action to perform. Specifically, we adaptively discretize the action space for each sampled belief using a hierarchical partition which we call a Voronoi tree. A Voronoi tree is a Binary Space Partitioning (BSP) that implicitly maintains the partition of a cell as the Voronoi diagram of two points sampled from the cell. This partitioning strategy keeps the cost of partitioning and estimating the size of each cell low, even in high-dimensional spaces where many sampled points are required to cover the space well. ADVT uses the estimated sizes of the cells to form an upper-confidence bound of the action values of the cell, and in turn uses the upper-confidence bound to guide the Monte Carlo Tree Search expansion and further discretization of the action space. This strategy enables ADVT to better exploit local information in the action space, leading to an action space discretization that is more adaptive, and hence more efficient in computing good POMDP solutions, compared to existing solvers. Experiments on simulations of four types of benchmark problems indicate that ADVT outperforms and scales substantially better to high-dimensional continuous action spaces, compared to state-of-the-art continuous action POMDP solvers.
翻译:以连续操作的方式解决部分可观测的 Markov 设定进程( POMDPs) 具有挑战性, 特别是对于高维行动空间。 为了缓解这一困难, 我们建议使用Voronoi 树( ADVT ) 进行基于取样的在线 POMDP 解析。 它使用 Monte Carlo 树搜索, 结合适应性分解动作空间, 以及乐观优化, 以高效地取样高维连续动作空间, 并计算要完成的最佳动作。 具体地说, 我们采用我们称之为 Voronoi 树的等级分隔, 将每个抽样信仰的动作空间空间空间空间空间空间空间空间空间空间空间空间空间空间空间空间空间。 Voranoioioi 基准树是一个二元空间分割( BSP ), 暗中维持一个单元格空间空间空间空间空间空间分割的间隔, 将更精确的定位系统空间定位到更精确的轨道上, 使当前空间空间空间的定位向上更精确的定位, 。