Partially Observable Markov Decision Processes (POMDP) is a widely used model to represent the interaction of an environment and an agent, under state uncertainty. Since the agent does not observe the environment state, its uncertainty is typically represented through a probabilistic belief. While the set of possible beliefs is infinite, making exact planning intractable, the belief space's complexity (and hence planning complexity) is characterized by its covering number. Many POMDP solvers uniformly discretize the belief space and give the planning error in terms of the (typically unknown) covering number. We instead propose an adaptive belief discretization scheme, and give its associated planning error. We furthermore characterize the covering number with respect to the POMDP parameters. This allows us to specify the exact memory requirements on the planner, needed to bound the value function error. We then propose a novel, computationally efficient solver using this scheme. We demonstrate that our algorithm is highly competitive with the state of the art in a variety of scenarios.
翻译:部分可观察的 Markov 决策程序(POMDP) 是一个广泛使用的模型,用来代表国家不确定情况下的环境和代理人的相互作用。由于代理人不观察环境状态,其不确定性通常通过概率性信念来表示。虽然一套可能的信念是无限的,使得精确的规划难以实现,但信仰空间的复杂性(因此也是规划的复杂性)的特点是其覆盖号。许多POMDP 解答器统一了信仰空间的离散,并给出了覆盖数字(通常未知)方面的规划错误。我们相反地提出了一个适应性信仰分离计划,并给出了相关的规划错误。我们用POMDP参数来描述覆盖数字的特性。这使我们能够确定规划者的确切记忆要求,从而约束价值函数错误。我们然后用这个计划提出一个新的、计算高效的解决方案。我们证明我们的算法在各种情景中与艺术状态具有高度竞争力。