Model-free continuous control for robot navigation tasks using Deep Reinforcement Learning (DRL) that relies on noisy policies for exploration is sensitive to the density of rewards. In practice, robots are usually deployed in cluttered environments, containing many obstacles and narrow passageways. Designing dense effective rewards is challenging, resulting in exploration issues during training. Such a problem becomes even more serious when tasks are described using temporal logic specifications. This work presents a deep policy gradient algorithm for controlling a robot with unknown dynamics operating in a cluttered environment when the task is specified as a Linear Temporal Logic (LTL) formula. To overcome the environmental challenge of exploration during training, we propose a novel path planning-guided reward scheme by integrating sampling-based methods to effectively complete goal-reaching missions. To facilitate LTL satisfaction, our approach decomposes the LTL mission into sub-goal-reaching tasks that are solved in a distributed manner. Our framework is shown to significantly improve performance (effectiveness, efficiency) and exploration of robots tasked with complex missions in large-scale cluttered environments. A video demonstration can be found on YouTube Channel: https://youtu.be/yMh_NUNWxho.
翻译:使用深强化学习系统(DRL)对机器人导航任务进行无模型连续控制,这种控制依赖于噪音的勘探政策,对奖励的密度十分敏感。实际上,机器人通常部署在杂乱的环境中,包含许多障碍和狭窄的通道。设计密集的有效奖赏具有挑战性,在培训期间导致勘探问题。当任务使用时间逻辑规格来描述时,这一问题就变得更加严重。当任务被指定为线性时,在杂乱的环境中运行的动态不明的机器人,其控制政策梯度算法就是一个深度的政策梯度算法。为了在培训期间克服对环境的探索挑战,我们建议采用新的路径规划指导奖励办法,将基于取样的方法结合起来,以有效完成具有目标性的任务。为了便利LTL的满意度,我们的方法将LTL任务转化为以分布方式解决的次级目标性任务。我们的框架表明,将大大改进在大型封闭环境中执行复杂任务的机器人的性能(效率、效率和探索)。在YouTube频道上可找到一个视频演示:https://youx_Mybe。