Exploration is a fundamental challenge in Deep Reinforcement Learning (DRL) based model-free navigation control since typical exploration techniques for target-driven navigation tasks rely on noise or greedy policies, which are sensitive to the density of rewards. In practice, robots are always deployed in complex cluttered environments, containing dense obstacles and narrow passageways, raising natural spare rewards that are hard to be explored for training. Such a problem becomes even more serious when pre-defined tasks are complex and have rich expressivity. In this paper, we focus on these two aspects and present a deep policy gradient algorithm for a task-guided robot with unknown dynamic systems deployed in a complex cluttered environment. Linear Temporal Logic (LTL) is applied to express a rich robotic specification. To overcome the environmental challenge of exploration during training, we propose a novel path planning-guided reward scheme that is dense over the state space, and crucially, robust to the infeasibility of computed geometric paths due to the black-box dynamics. To facilitate LTL satisfaction, our approach decomposes the LTL mission into sub-tasks that are solved using distributed DRL, where the sub-tasks can be trained in parallel, using Deep Policy Gradient algorithms. Our framework is shown to significantly improve performance (effectiveness, efficiency) and exploration of robots tasked with complex missions in large-scale complex environments. The Video demo can be found on YouTube Channel: https://youtu.be/YQRQ2-yMtIk.
翻译:探索是深强化学习(DRL)基于无型导航控制的一个根本性挑战,因为目标驱动导航任务典型的探索技术依赖于噪音或贪婪政策,对奖励的密度十分敏感。在实践中,机器人总是在复杂的杂乱环境中部署,包含密集的障碍和狭窄的通道,增加难于培训探索的自然剩余奖励。当预先确定的任务复杂且具有丰富的表达性时,这一问题就变得更加严重。在本文中,我们侧重于这两个方面,为在复杂布局环境部署的、任务引导型机器人提供一种未知动态系统的深度政策梯度算法。线性温度逻辑(LTL)被用来表达丰富的机器人规格。为了克服培训期间的勘探环境挑战,我们提出了一个新的路径规划指导奖励计划,该计划在州空间上十分密集,而且关键地是,对于由于黑箱动态而计算地球测量路径的不可行性。为了便利LTL2的满意度,我们的方法可以将LT任务引入子系统,在复杂的环境中部署一个未知的动态系统。LTL任务将用来表达丰富的机器人技术规格。使用分布式机器人规格,在DRL的深度的轨道上展示我们的效率。