Using a novel toy nautical navigation environment, we show that dynamic programming can be used when only incomplete information about a partially observed Markov decision process (POMDP) is known. By incorporating uncertainty into our model, we show that navigation policies can be constructed that maintain safety, outperforming the baseline performance of traditional dynamic programming for Markov decision processes (MDPs). Adding in controlled sensing methods, we show that these policies can also lower measurement costs at the same time.
翻译:使用新型的玩具航行环境,我们证明,当只知道关于部分观察到的Markov决策过程(POMDP)的不完整信息时,就可以使用动态编程。 通过将不确定性纳入我们的模型,我们证明可以制定维护安全的导航政策,超过Markov决策过程(MDPs)传统动态编程(MDPs)的基线性能。 在控制遥感方法中,我们显示这些政策也可以同时降低测量成本。