Robotic systems are nowadays capable of solving complex navigation tasks under real-world conditions. However, their capabilities are intrinsically limited to the imagination of the designer and consequently lack generalizability to initially unconsidered situations. This makes deep reinforcement learning especially interesting, as these algorithms promise a self-learning system only relying on feedback from the environment. Having the system itself search for an optimal solution brings the benefit of great generalization or even constant improvement when life-long learning is addressed. In this paper, we address robot navigation in continuous action space using deep hierarchical reinforcement learning without including the target location in the state representation. Our agent self-assigns internal goals and learns to extract reasonable waypoints to reach the desired target position only based on local sensor data. In our experiments we demonstrate that our hierarchical structure improves the performance of the navigation agent in terms of collected reward and success rate in comparison to a flat structure, while not requiring any global or target information.
翻译:机器人系统现在有能力在现实世界条件下解决复杂的导航任务。 但是,它们的能力本质上限于设计者的想象力,因此在最初没有考虑的情况下缺乏通用性。 这使得深度强化学习变得特别有趣,因为这些算法承诺只依靠来自环境的反馈而建立自学系统。 系统本身在寻找最佳解决方案时,在解决终身学习问题时,将带来巨大的普遍化甚至不断改进的好处。 在本文件中,我们使用深层次的强化学习来解决连续操作空间的机器人导航问题,而不包括州内的目标位置。 我们的代理商自行指定内部目标并学习提取合理的路径,以便仅根据当地传感器数据达到预期的目标位置。 在我们的实验中,我们证明我们的等级结构在收集的奖励和成功率与固定结构相比方面提高了导航工具的性能,同时不要求任何全球或目标信息。