Risk awareness is fundamental to an online operating agent. However, it received less attention in the challenging continuous domain under partial observability. Existing constrained POMDP algorithms are typically designed for discrete state and observation spaces. In addition, current solvers for constrained formulations do not support general belief-dependent constraints. Crucially, in the POMDP setting, risk awareness in the context of a constraint was addressed in a limited way. This paper presents a novel formulation for risk-averse belief-dependent constrained POMDP. Our probabilistic constraint is general and belief-dependent, as is the reward function. The proposed universal framework applies to a continuous domain with nonparametric beliefs represented by particles or parametric beliefs. We show that our formulation better accounts for the risk than previous approaches.
翻译:风险意识是在线操作代理人的根本所在。但是,在具有挑战性的连续领域,在部分可观察性下,风险意识得到的关注较少。现有的受限制的POMDP算法通常是为离散状态和观察空间设计的。此外,目前用于受限配方的解决方案并不支持一般依赖信仰的制约因素。在POMDP背景下,在限制背景下的风险意识得到了有限处理。本文件为规避风险的信仰受限的POMDP提供了一种新颖的提法。我们的概率限制是普遍的,而且依赖信仰,就像奖赏功能一样。拟议的通用框架适用于以粒子或参数信仰为代表的非参数性信念的连续领域。我们表明,我们制定的风险比以往的方法更好地说明风险。