The partially observable Markov decision process (POMDP) is a principled general framework for robot decision making under uncertainty, but POMDP planning suffers from high computational complexity, when long-term planning is required. While temporally-extended macro-actions help to cut down the effective planning horizon and significantly improve computational efficiency, how do we acquire good macro-actions? This paper proposes Macro-Action Generator-Critic (MAGIC), which performs offline learning of macro-actions optimized for online POMDP planning. Specifically, MAGIC learns a macro-action generator end-to-end, using an online planner's performance as the feedback. During online planning, the generator generates on the fly situation-aware macro-actions conditioned on the robot's belief and the environment context. We evaluated MAGIC on several long-horizon planning tasks both in simulation and on a real robot. The experimental results show that the learned macro-actions offer significant benefits in online planning performance, compared with primitive actions and handcrafted macro-actions.
翻译:部分可见的Markov 决策程序(POMDP)是一个在不确定情况下进行机器人决策的原则性总体框架,但是,在需要长期规划时,POMDP的规划具有很高的计算复杂性。虽然时间延伸的宏观行动有助于削减有效规划视野,显著提高计算效率,但我们如何获得良好的宏观行动?本文提议宏观行动发电机-cric (MAGIC), 用于在网上规划POMDP时最佳的宏观行动进行离线学习。 具体地说, MAGIC 学习宏观行动源端对端, 使用在线规划员的性能作为反馈。 在在线规划期间, 生成者在以机器人的信念和环境环境为条件的飞行状况( 觉察宏观行动) 上生成。 我们在模拟和真正的机器人上评估了MAGIC 的几项长方位规划任务。 实验结果显示, 与原始行动和手工的宏观行动相比, 所学的宏观行动在网上规划性能带来重大好处。