Planning under partial obervability is essential for autonomous robots. A principled way to address such planning problems is the Partially Observable Markov Decision Process (POMDP). Although solving POMDPs is computationally intractable, substantial advancements have been achieved in developing approximate POMDP solvers in the past two decades. However, computing robust solutions for problems with continuous observation spaces remains challenging. Most on-line solvers rely on discretising the observation space or artificially limiting the number of observations that are considered during planning to compute tractable policies. In this paper we propose a new on-line POMDP solver, called Lazy Belief Extraction for Continuous POMDPs (LABECOP), that combines methods from Monte-Carlo-Tree-Search and particle filtering to construct a policy reprentation which doesn't require discretised observation spaces and avoids limiting the number of observations considered during planning. Experiments on three different problems involving continuous observation spaces indicate that LABECOP performs similar or better than state-of-the-art POMDP solvers.
翻译:在局部惯性下进行局部惯性规划对于自主机器人至关重要。解决此类规划问题的一个原则方法是部分可观测的Markov决策程序(POMDP )。尽管解决POMDP 程序在计算上是难以解决的,但在过去20年中,在开发近似POMDP 解决器方面取得了很大进展。然而,对连续观测空间的问题,在计算强有力的解决方案方面仍然具有挑战性。大多数在线解决方案依赖分离观测空间,或者人为地限制在计划计算可移动政策期间所考虑的观测数量。在本文中,我们提议了一个新的在线POMDP 解答器,称为“持续POMDP 解答器(LABECOP ),将蒙特-卡洛-特雷-缝隙和粒子过滤器的方法结合起来,以构建不需要离散观测空间和避免限制规划期间所考虑的观测次数的政策重新定位。在涉及连续观测空间的三个不同问题上进行的实验表明,LABECOP 执行的解决方案与最先进的POMDP 解答器类似或更好。