This paper introduces Voronoi Progressive Widening (VPW), a generalization of Voronoi optimistic optimization (VOO) and action progressive widening to partially observable Markov decision processes (POMDPs). Tree search algorithms can use VPW to effectively handle continuous or hybrid action spaces by efficiently balancing local and global action searching. This paper proposes two VPW-based algorithms and analyzes them from theoretical and simulation perspectives. Voronoi Optimistic Weighted Sparse Sampling (VOWSS) is a theoretical tool that justifies VPW-based online solvers, and it is the first algorithm with global convergence guarantees for continuous state, action, and observation POMDPs. Voronoi Optimistic Monte Carlo Planning with Observation Weighting (VOMCPOW) is a versatile and efficient algorithm that consistently outperforms state-of-the-art POMDP algorithms in several simulation experiments.
翻译:本文介绍Voronoi 进步宽广(VPW),Voronoi 乐观优化(VOOO)的概括化和逐步扩大到部分可观测的Markov决定程序(POMDPs)的行动。树搜索算法可以使用VPW,通过高效率地平衡当地和全球行动搜索,有效地处理连续或混合行动空间。本文提出基于Voronoi 的两种基于VPW的基于VOPW的算法,并从理论和模拟角度分析这些算法。Voronoioi 乐观的微粒抽样抽样(VOWSS)是一个理论工具,为基于VPW的在线解算法提供理由,它是第一个具有连续状态、行动和观察POMDPs全球趋同保证的算法。Voronooopimic Monte Carplan plan with Osurviewing (VOMCPO)是一种多功能和高效的算法,在几个模拟实验中始终优于最新POMDP算法。