In this paper, we consider online planning in partially observable domains. Solving the corresponding POMDP problem is a very challenging task, particularly in an online setting. Our key contribution is a novel algorithmic approach, Simplified Information Theoretic Belief Space Planning (SITH-BSP), which aims to speed-up POMDP planning considering belief-dependent rewards, without compromising on the solution's accuracy. We do so by mathematically relating the simplified elements of the problem to the corresponding counterparts of the original problem. Specifically, we focus on belief simplification and use it to formulate bounds on the corresponding original belief-dependent rewards. These bounds in turn are used to perform branch pruning over the belief tree, in the process of calculating the optimal policy. We further introduce the notion of adaptive simplification, while re-using calculations between different simplification levels and exploit it to prune, at each level in the belief tree, all branches but one. Therefore, our approach is guaranteed to find the optimal solution of the original problem but with substantial speedup. As a second key contribution, we derive novel analytical bounds for differential entropy, considering a sampling-based belief representation, which we believe are of interest on their own. We validate our approach in simulation using these bounds and where simplification corresponds to reducing the number of samples, exhibiting a significant computational speedup while yielding the optimal solution.
翻译:在本文中,我们考虑在部分可见的域内进行在线规划。 解决相应的POMDP问题是一项非常艰巨的任务, 特别是在在线设置中。 我们的关键贡献是一种新的算法方法,即简化信息理论信仰空间规划(SITH-BSP),目的是加速POMDP规划,考虑依赖信仰的奖励,而不损害解决方案的准确性。 我们这样做的方法是从数学上将问题的简化要素与最初问题的对应对应方联系起来。 具体地说,我们侧重于简化信仰,并用它来制定相应的最初依赖信仰的奖赏的界限。 这些界限又被用来在计算最佳政策的过程中,对信仰树进行分支调整。 我们进一步引入了适应性简化的概念,同时在不同简化等级之间重新使用计算,并在信仰树的每个级别上,除一个分支外,利用它来将问题的简化内容与最初问题的对应方联系起来。 因此,我们的方法可以保证找到问题的最佳解决方案,但速度要大大加快。 作为第二关键的贡献,我们为差异研究自己的分析框, 考虑一个基于抽样的模型化的模型, 并用一个我们相信的模型模拟方法来验证。