When robots operate in the real world, they need to handle uncertainties in sensing, acting, and the environment dynamics. Many tasks also require reasoning about long-term consequences of robot decisions. The partially observable Markov decision process (POMDP) offers a principled approach for planning under uncertainty. However, its computational complexity grows exponentially with the planning horizon. We propose to use temporally-extended macro-actions to cut down the effective planning horizon and thus the exponential factor of the complexity. We propose Macro-Action Generator-Critic (MAGIC), an algorithm that learns a macro-action generator using feedback from a planner, and in turn uses the learned macro-actions to condition long-horizon planning. Importantly, the generator is learned to directly maximize the down-stream planning performance. We evaluate MAGIC on several long-term planning tasks, showing that it significantly outperforms planning using primitive actions and hand-crafted macro-actions in both simulation and on a real robot.
翻译:当机器人在现实世界中运作时,他们需要处理在感测、行为和环境动态方面的不确定性。许多任务也需要对机器人决定的长期后果进行推理。部分可见的Markov决策程序(POMDP)提供了一种在不确定情况下进行规划的原则性方法。然而,其计算复杂性随着规划地平线而成倍增长。我们提议使用时间延伸的宏观行动来缩小有效的规划视野,从而缩小复杂性的指数性系数。我们提议了宏观行动发电机-批评(MAGIC),这是一种算法,它利用规划者的反馈来学习宏观行动生成器,而反过来又利用所学的宏观行动来决定长期的模拟和真正的机器人规划条件。重要的是,该生成器学会直接最大限度地提高下游规划绩效。我们在若干长期规划任务中评估了MAGIC,显示它在模拟和真正的机器人中都大大超过使用原始行动和手工制作的宏观行动进行规划的效果。