对POMDPs接近POMDPs的粒子信仰的优化保障 (Optimality Guarantees for Particle Belief Approximation of POMDPs)

Partially observable Markov decision processes (POMDPs) provide a flexible representation for real-world decision and control problems. However, POMDPs are notoriously difficult to solve, especially when the state and observation spaces are continuous or hybrid, which is often the case for physical systems. While recent online sampling-based POMDP algorithms that plan with observation likelihood weighting have shown practical effectiveness, a general theory characterizing the approximation error of the particle filtering techniques that these algorithms use has not previously been proposed. Our main contribution is bounding the error between any POMDP and its corresponding finite sample particle belief MDP (PB-MDP) approximation. This fundamental bridge between PB-MDPs and POMDPs allows us to adapt any sampling-based MDP algorithm to a POMDP by solving the corresponding particle belief MDP, thereby extending the convergence guarantees of the MDP algorithm to the POMDP. Practically, this is implemented by using the particle filter belief transition model as the generative model for the MDP solver. While this requires access to the observation density model from the POMDP, it only increases the transition sampling complexity of the MDP solver by a factor of $\mathcal{O}(C)$, where $C$ is the number of particles. Thus, when combined with sparse sampling MDP algorithms, this approach can yield algorithms for POMDPs that have no direct theoretical dependence on the size of the state and observation spaces. In addition to our theoretical contribution, we perform five numerical experiments on benchmark POMDPs to demonstrate that a simple MDP algorithm adapted using PB-MDP approximation, Sparse-PFT, achieves performance competitive with other leading continuous observation POMDP solvers.

翻译：部分可见的Markov 决策流程(POMDPs)为真实世界决策和控制问题提供了一个灵活的表达方式。然而,POMDPs为真实世界决策和控制问题提供了一个灵活的表达方式。然而,众所周知,POMDPs很难解决,特别是当状态和观察空间是连续或混合的时,这往往是物理系统的情况。尽管最近基于在线抽样的POMDP算法(该算法计划带有观察可能性加权法)显示了实际效果,但该算法使用的粒子过滤技术的近似误差一般理论并未被提出。我们的主要贡献是将任何POMDP及其相应的有限样样粒粒子信仰MDPs(PB-MDPs)近似。PB-MDPs 和PMOMDPs 之间的这一基本桥梁使我们得以将任何基于取样的MDP算法算法算法的算法转换为POMDPs mDPs mDPs mDPsqr 的直序运算法化。实际,这需要从POMDPP-DPsal pressional imal imal immal immission imation immissionsal exmission exmal expal exmissional exmission (PMOMMDPs) expal exal ex) 需要从PMDPs 可以算算算算算算算算算算算算算算算算算算算算算算算算算算算算算算算算算算算算算算算算数。