The Partially Observable Markov Decision Process (POMDP) is a powerful framework for capturing decision-making problems that involve state and transition uncertainty. However, most current POMDP planners cannot effectively handle high-dimensional image observations prevalent in real world applications, and often require lengthy online training that requires interaction with the environment. In this work, we propose Visual Tree Search (VTS), a compositional learning and planning procedure that combines generative models learned offline with online model-based POMDP planning. The deep generative observation models evaluate the likelihood of and predict future image observations in a Monte Carlo tree search planner. We show that VTS is robust to different types of image noises that were not present during training and can adapt to different reward structures without the need to re-train. This new approach significantly and stably outperforms several baseline state-of-the-art vision POMDP algorithms while using a fraction of the training time.
翻译:部分可观察的Markov 决策程序(POMDP)是捕捉涉及状态和过渡不确定性的决策问题的有力框架,然而,大多数目前的POMDP规划人员无法有效地处理现实世界应用中普遍存在的高维图像观测,而且往往需要冗长的在线培训,需要与环境互动。在这项工作中,我们提议了视觉树搜索(VTS),这是一个组成学习和规划程序,将从网上从基于模型的POMDP规划中汲取的基因化模型结合起来。深层次的基因化观察模型评估了蒙特卡洛树搜索计划设计员未来图像观测的可能性并预测了这些观测结果。我们表明,VTS对于培训期间没有出现的不同类型的图像噪音是强大的,可以适应不同的奖赏结构,而无需再培训。这一新方法在使用培训时间的一小部分时间的同时,大大超越了几个基本状态的POMDP 算法。