Due to recent breakthroughs, reinforcement learning (RL) has demonstrated impressive performance in challenging sequential decision-making problems. However, an open question is how to make RL cope with partial observability which is prevalent in many real-world problems. Contrary to contemporary RL approaches, which focus mostly on improved memory representations or strong assumptions about the type of partial observability, we propose a simple but efficient approach that can be applied together with a wide variety of RL methods. Our main insight is that smoothly transitioning from full observability to partial observability during the training process yields a high performance policy. The approach, called partially observable guided reinforcement learning (PO-GRL), allows to utilize full state information during policy optimization without compromising the optimality of the final policy. A comprehensive evaluation in discrete partially observableMarkov decision process (POMDP) benchmark problems and continuous partially observable MuJoCo and OpenAI gym tasks shows that PO-GRL improves performance. Finally, we demonstrate PO-GRL in the ball-in-the-cup task on a real Barrett WAM robot under partial observability.
翻译:由于最近的突破,强化学习(RL)在挑战相继决策问题时表现出了令人印象深刻的成绩,然而,一个未决问题是如何使RL处理部分可视性,这在许多现实世界问题中十分普遍。与当代RL方法相反,它主要侧重于改进记忆表现或对部分可视性类型的强烈假设,我们建议一种简单而有效的方法,可以与广泛的RL方法一起应用。我们的主要见解是,在培训过程中从完全可视性顺利地过渡到部分可视性,产生了一种高绩效的政策。这种方法被称为部分可视性辅助学习(PO-GRL),允许在政策优化期间利用全部国家信息,同时不损害最终政策的最佳性。对部分可视性马尔科夫决策进程(POMDP)基准问题和连续部分可视 MuJoCo 和 OpenAI 健身任务的全面评价表明,PO-GRL提高了绩效。最后,我们展示了在可部分可视性下真正巴雷特WAM机器人的球项任务中的P-GRL。