连续 POMDP 使用Voronoi 树进行适应性分解 (Adaptive Discretization using Voronoi Trees for Continuous POMDPs)

Solving continuous Partially Observable Markov Decision Processes (POMDPs) is challenging, particularly for high-dimensional continuous action spaces. To alleviate this difficulty, we propose a new sampling-based online POMDP solver, called Adaptive Discretization using Voronoi Trees (ADVT). It uses Monte Carlo Tree Search in combination with an adaptive discretization of the action space as well as optimistic optimization to efficiently sample high-dimensional continuous action spaces and compute the best action to perform. Specifically, we adaptively discretize the action space for each sampled belief using a hierarchical partition called Voronoi tree, which is a Binary Space Partitioning that implicitly maintains the partition of a cell as the Voronoi diagram of two points sampled from the cell. ADVT uses the estimated diameters of the cells to form an upper-confidence bound on the action value function within the cell, guiding the Monte Carlo Tree Search expansion and further discretization of the action space. This enables ADVT to better exploit local information with respect to the action value function, allowing faster identification of the most promising regions in the action space, compared to existing solvers. Voronoi trees keep the cost of partitioning and estimating the diameter of each cell low, even in high-dimensional spaces where many sampled points are required to cover the space well. ADVT additionally handles continuous observation spaces, by adopting an observation progressive widening strategy, along with a weighted particle representation of beliefs. Experimental results indicate that ADVT scales substantially better to high-dimensional continuous action spaces, compared to state-of-the-art methods.

翻译：持续解析部分可观测的 Markov 决策进程( POMDPs) 具有挑战性, 特别是对于高维持续行动空间。为了缓解这一困难, 我们提议一个新的基于取样的在线 POMDP 解析器, 名为 Voronoi 树( ADVT ) 。它使用 Monte Carlo 树搜索, 加上一个适应性分解动作空间, 以及乐观优化, 以高效抽样高维连续行动空间, 并计算执行的最佳动作。具体地说, 我们利用一个名为 Voronoi 树的等级分隔, 将每个样本空间的操作空间分解, 这是一种二维空间分解, 暗中维持一个细胞的分解, 称为VOMoronoooi图, 由两个样本样本中的两个点组成。 ADVVT 使用估计的直径, 引导 Monte Carloe 搜索空间的扩展和进一步分解操作空间。使ADVT 更好地利用行动值功能, 更快地识别行动空间中最有前景的区域,, 与现有的直径直径直径直径比。

相关内容

Continuity

关注 4

让 iOS 8 和 OS X Yosemite 无缝切换的一个新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source: Apple - iOS 8

不可错过！700+ppt《因果推理》课程！杜克大学Fan Li教程

专知会员服务

72+阅读 · 2022年7月11日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

55+阅读 · 2020年9月7日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日