Although risk awareness is fundamental to an online operating agent, it has received less attention in the challenging continuous domain and under partial observability. This paper presents a novel formulation and solution for risk-averse belief-dependent probabilistically constrained continuous POMDP. We tackle a demanding setting of belief-dependent reward and constraint operators. The probabilistic confidence parameter makes our formulation genuinely risk-averse and much more flexible than the state-of-the-art chance constraint. Our rigorous analysis shows that in the stiffest probabilistic confidence case, our formulation is very close to chance constraint. However, our probabilistic formulation allows much faster and more accurate adaptive acceptance or pruning of actions fulfilling or violating the constraint. In addition, with an arbitrary confidence parameter, we did not find any analogs to our approach. We present algorithms for the solution of our formulation in continuous domains. We also uplift the chance-constrained approach to continuous environments using importance sampling. Moreover, all our presented algorithms can be used with parametric and nonparametric beliefs represented by particles. Last but not least, we contribute, rigorously analyze and simulate an approximation of chance-constrained continuous POMDP. The simulations demonstrate that our algorithms exhibit unprecedented celerity compared to the baseline, with the same performance in terms of collisions.
翻译:虽然风险意识对于在线操作剂来说至关重要,但在具有挑战性的连续领域和部分可视性下,风险意识却受到较少关注。本文为基于风险的反信仰的接受或操纵行动提供了一种新颖的提法和解决办法,因为这种反信仰的概率约束性持续POMDP。我们处理的是依赖信仰的奖赏和约束操作者的严格设置。概率信任参数使我们的提法真正具有风险反向性,而且比最先进的机会制约更加灵活。我们的严格分析表明,在最严谨的概率信心案例中,我们的提法非常接近于机会约束。然而,我们的概率化提法允许更快和更准确地适应性地接受或操纵履行或违反约束的行动。此外,我们没有找到任何与我们的方法相类似的任意信任参数。我们提出了在连续领域解决我们公式的算法。我们还提升了利用重要取样对持续环境的受机会限制的方法。此外,我们提出的所有算法都可以用粒子代表的对准和非对等的信念来使用。最后但同样重要的是,我们的贡献是,严格地分析并模拟了我们所持的概率模拟的、持续和模拟的精确的对结果。