Solving robotic navigation tasks via reinforcement learning (RL) is challenging due to their sparse reward and long decision horizon nature. However, in many navigation tasks, high-level (HL) task representations, like a rough floor plan, are available. Previous work has demonstrated efficient learning by hierarchal approaches consisting of path planning in the HL representation and using sub-goals derived from the plan to guide the RL policy in the source task. However, these approaches usually neglect the complex dynamics and sub-optimal sub-goal-reaching capabilities of the robot during planning. This work overcomes these limitations by proposing a novel hierarchical framework that utilizes a trainable planning policy for the HL representation. Thereby robot capabilities and environment conditions can be learned utilizing collected rollout data. We specifically introduce a planning policy based on value iteration with a learned transition model (VI-RL). In simulated robotic navigation tasks, VI-RL results in consistent strong improvement over vanilla RL, is on par with vanilla hierarchal RL on single layouts but more broadly applicable to multiple layouts, and is on par with trainable HL path planning baselines except for a parking task with difficult non-holonomic dynamics where it shows marked improvements.
翻译:通过强化学习(RL)解决机器人导航任务具有挑战性,因为其奖赏稀少,而且决策视野长,但是在许多导航任务中,都存在像粗糙的楼层计划那样的高层次(HL)任务说明,但在许多导航任务中,高级别(HL)任务说明,可以像粗略的楼层计划一样,提供高层次(HL)任务说明。以前的工作表明,通过高层次方法,通过高层次(HL)任务说明,通过高层次方法,高效率地学习,包括高层次代表层代表的路径规划,包括高层次代表层代表的路径规划,并使用从计划中得出的次级目标,以指导RL(RL)在源任务中指导RL(RL)政策。然而,这些方法通常忽视机器人在规划期间的复杂动态和次最佳次目标影响能力。这项工作克服了这些局限性,提出了新的等级框架,利用了高层次代表制规划政策。利用收集的推出的数据,可以学习机器人的机器人能力和环境条件。我们具体采用了基于价值迭代价值的规划政策,并采用学习的过渡模式(VI-RL)在模拟机器人导航任务任务任务中,比Vanilla RL(VL)使任务得到明显改进的基线,但没有困难。