Rather than augmenting rewards with penalties for undesired behavior, Constrained Partially Observable Markov Decision Processes (CPOMDPs) plan safely by imposing inviolable hard constraint value budgets. Previous work performing online planning for CPOMDPs has only been applied to discrete action and observation spaces. In this work, we propose algorithms for online CPOMDP planning for continuous state, action, and observation spaces by combining dual ascent with progressive widening. We empirically compare the effectiveness of our proposed algorithms on continuous CPOMDPs that model both toy and real-world safety-critical problems. Additionally, we compare against the use of online solvers for continuous unconstrained POMDPs that scalarize cost constraints into rewards, and investigate the effect of optimistic cost propagation.
翻译:限制部分可观察的Markov决策程序(CPOMDPs)计划,通过强制实施不可侵犯的硬性约束价值预算来安全地规划。 以往为CPOMDPs进行在线规划的工作只应用到独立的行动和观察空间。 在这项工作中,我们建议将在线CPOMDP规划连续状态、行动和观察空间的算法与逐步扩大结合起来。 我们用经验比较了我们提议的算法对不断的CPOMDPs的有效性,这些算法既模拟了玩具问题,也模拟了现实世界的安全危急问题。 此外,我们比较了使用在线解算器来持续不受限制地将成本限制的POMDPs升级为奖励,并调查乐观成本传播的效果。