Online decision making under uncertainty in partially observable domains, also known as Belief Space Planning, is a fundamental problem in robotics and Artificial Intelligence. Due to an abundance of plausible future unravelings, calculating an optimal course of action inflicts an enormous computational burden on the agent. Moreover, in many scenarios, e.g., information gathering, it is required to introduce a belief-dependent constraint. Prompted by this demand, in this paper, we consider a recently introduced probabilistic belief-dependent constrained POMDP. We present a technique to adaptively accept or discard a candidate action sequence with respect to a probabilistic belief-dependent constraint, before expanding a complete set of future observations samples and without any loss in accuracy. Moreover, using our proposed framework, we contribute an adaptive method to find a maximal feasible return (e.g., information gain) in terms of Value at Risk for the candidate action sequence with substantial acceleration. On top of that, we introduce an adaptive simplification technique for a probabilistically constrained setting. Such an approach provably returns an identical-quality solution while dramatically accelerating online decision making. Our universal framework applies to any belief-dependent constrained continuous POMDP with parametric beliefs, as well as nonparametric beliefs represented by particles. In the context of an information-theoretic constraint, our presented framework stochastically quantifies if a cumulative information gain along the planning horizon is sufficiently significant (e.g. for, information gathering, active SLAM). We apply our method to active SLAM, a highly challenging problem of high dimensional Belief Space Planning. Extensive realistic simulations corroborate the superiority of our proposed ideas.
翻译:在部分可观测域(又称信仰空间规划)的不确定性下进行在线决策,这是机器人和人工智能中一个根本问题。由于未来充满了看似合理的分解,计算最佳行动路线给代理人带来了巨大的计算负担。此外,在许多情形中,例如信息收集中,需要引入一个依赖信仰的制约。根据这一需求,本文件认为最近引入了一种适应性简化技术,以依赖信仰为主的制约POMDP。我们提出了一种技术,以适应性方式接受或放弃一个候选人行动序列,在基于信仰的概率制约方面,在扩大一套完整的未来观察样本之前,而且不造成任何准确的损失。此外,我们利用我们提议的框架,促进一种适应性方法,以最大可行的、可行的回报(例如,信息增益)为候选人行动序列的风险值。此外,我们采用了一种适应性简化技术,以稳定性约束性制约性环境。这样一种方法,在大幅加快在线决策之前,可以返回一个相同的质量解决方案,同时大大加速进行在线决策。我们的全球框架,即以高度的准确的准确性信念规划,将一个持续、持续、持续、持续、持续、持续、持续、持续、持续、持续、持续、持续、持续、持续、持续、持续、持续、持续、持续、持续、持续、持续、持续、持续、持续、持续、持续、稳定的信念的信念的信念的信念的信念的信念、持续、持续、持续、持续、持续、持续、持续、持续、持续、持续、持续、持续、持续、持续、持续、持续、持续、不断的信念的信念、持续、持续、持续、持续、持续、持续、持续、持续、持续、持续、不断的信念的信念的信念的信念、持续、持续、持续、持续、持续、持续、持续、持续、持续、不断的信念的信念的信念、持续、持续、持续、持续、持续、持续、持续、持续、持续、持续、持续、持续、持续、持续、持续、持续、持续、持续、持续、持续、不断的信念的信念的信念、持续、持续、持续、持续、持续、持续、持续、持续、持续、持续、持续、持续、持续、持续、持续、持续、持续、持续、持续、持续、持续、持续、持续的信念、持续、持续、持续、持续、持续、持续、持续、持续、持续、持续、持续