The Partially Observable Markov Decision Process (POMDP) is a powerful framework for capturing decision-making problems that involve state and transition uncertainty. However, most current POMDP planners cannot effectively handle very high-dimensional observations they often encounter in the real world (e.g. image observations in robotic domains). In this work, we propose Visual Tree Search (VTS), a learning and planning procedure that combines generative models learned offline with online model-based POMDP planning. VTS bridges offline model training and online planning by utilizing a set of deep generative observation models to predict and evaluate the likelihood of image observations in a Monte Carlo tree search planner. We show that VTS is robust to different observation noises and, since it utilizes online, model-based planning, can adapt to different reward structures without the need to re-train. This new approach outperforms a baseline state-of-the-art on-policy planning algorithm while using significantly less offline training time.
翻译:部分可观察的Markov 决策程序(POMDP)是捕捉涉及状态和过渡不确定性的决策问题的有力框架,然而,大多数目前的POMDP规划人员无法有效地处理他们在现实世界中经常遇到的非常高的层次的观测(例如机器人域的图像观测),在这项工作中,我们建议采用视觉树搜索(VTS),这是一个学习和规划程序,将从网上学习的离线模型与基于模型的POMDP规划结合起来。 VTS 离线模型培训和在线规划,利用一套深层次的遗传观察模型来预测和评价蒙特卡洛树搜索计划员的图像观测可能性。我们表明,VTS对不同的观测噪音是强大的,并且由于它使用在线的模型规划,因此可以适应不同的奖励结构,而不需要再培训。这种新办法在使用离线培训时间时,超越了基线的状态政策规划算法。