用于连续观测空间的在线 POMDP 溶解器 (An On-Line POMDP Solver for Continuous Observation Spaces)

Planning under partial obervability is essential for autonomous robots. A principled way to address such planning problems is the Partially Observable Markov Decision Process (POMDP). Although solving POMDPs is computationally intractable, substantial advancements have been achieved in developing approximate POMDP solvers in the past two decades. However, computing robust solutions for problems with continuous observation spaces remains challenging. Most on-line solvers rely on discretising the observation space or artificially limiting the number of observations that are considered during planning to compute tractable policies. In this paper we propose a new on-line POMDP solver, called Lazy Belief Extraction for Continuous POMDPs (LABECOP), that combines methods from Monte-Carlo-Tree-Search and particle filtering to construct a policy reprentation which doesn't require discretised observation spaces and avoids limiting the number of observations considered during planning. Experiments on three different problems involving continuous observation spaces indicate that LABECOP performs similar or better than state-of-the-art POMDP solvers.

翻译：在局部惯性下进行局部惯性规划对于自主机器人至关重要。解决此类规划问题的一个原则方法是部分可观测的Markov决策程序(POMDP )。尽管解决POMDP 程序在计算上是难以解决的,但在过去20年中,在开发近似POMDP 解决器方面取得了很大进展。然而,对连续观测空间的问题,在计算强有力的解决方案方面仍然具有挑战性。大多数在线解决方案依赖分离观测空间,或者人为地限制在计划计算可移动政策期间所考虑的观测数量。在本文中,我们提议了一个新的在线POMDP 解答器,称为“持续POMDP 解答器(LABECOP ),将蒙特-卡洛-特雷-缝隙和粒子过滤器的方法结合起来,以构建不需要离散观测空间和避免限制规划期间所考虑的观测次数的政策重新定位。在涉及连续观测空间的三个不同问题上进行的实验表明,LABECOP 执行的解决方案与最先进的POMDP 解答器类似或更好。

相关内容

Continuity

关注 4

让 iOS 8 和 OS X Yosemite 无缝切换的一个新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source: Apple - iOS 8

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

111+阅读 · 2020年5月15日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【NeurIPS2019演讲】伯克利Pieter Abbeel，通过元强化学习实现更好的基于模型的RL(Better Model-based RL through Meta RL)

专知会员服务

33+阅读 · 2019年12月13日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日