In this study I proposed a filtering beliefs method for improving performance of Partially Observable Markov Decision Processes(POMDPs), which is a method wildly used in autonomous robot and many other domains concerning control policy. My method search and compare every similar belief pair. Because a similar belief have insignificant influence on control policy, the belief is filtered out for reducing training time. The empirical results show that the proposed method outperforms the point-based approximate POMDPs in terms of the quality of training results as well as the efficiency of the method.
翻译:在这项研究中,我提议了一种过滤信仰方法,用于改进部分可观测的Markov决定程序(POMDPs)的性能,这是自主机器人和其他许多控制政策领域疯狂使用的一种方法。我的方法是搜索和比较每一对类似的信仰。由于类似信仰对控制政策的影响不大,因此为了减少培训时间,将这种信仰过滤出去。经验结果表明,从培训结果的质量以及方法的效率来看,拟议的方法比基于点的近似POMDP(POMDPs)的性能要好得多。