We present a Deep Reinforcement Learning (DRL) algorithm for a task-guided robot with unknown continuous-time dynamics deployed in a large-scale complex environment. Linear Temporal Logic (LTL) is applied to express a rich robotic specification. To overcome the environmental challenge, we propose a novel path planning-guided reward scheme that is dense over the state space, and crucially, robust to infeasibility of computed geometric paths due to the unknown robot dynamics. To facilitate LTL satisfaction, our approach decomposes the LTL mission into sub-tasks that are solved using distributed DRL, where the sub-tasks are trained in parallel, using Deep Policy Gradient algorithms. Our framework is shown to significantly improve performance (effectiveness, efficiency) and exploration of robots tasked with complex missions in large-scale complex environments.
翻译:我们为在大规模复杂环境中部署连续时间动态不明的任务制机器人提供了深强化学习算法(DRL),用于表达丰富的机器人规格。为了克服环境挑战,我们提出了一个新的路径规划制奖赏计划,该计划在州空间上十分密集,而且由于未知的机器人动态,对于计算几何路径的不可行性至关重要。为了便利LTL的满意度,我们的方法将LTL任务分解成子任务,通过分布式的DRL解决,在DRL对子任务进行平行培训,使用深政策梯度算法。我们的框架显示大大改进了在大型复杂环境中执行复杂任务的机器人的性能(效力、效率)和探索。