Development of navigation algorithms is essential for the successful deployment of robots in rapidly changing hazardous environments for which prior knowledge of configuration is often limited or unavailable. Use of traditional path-planning algorithms, which are based on localization and require detailed obstacle maps with goal locations, is not possible. In this regard, vision-based algorithms hold great promise, as visual information can be readily acquired by a robot's onboard sensors and provides a much richer source of information from which deep neural networks can extract complex patterns. Deep reinforcement learning has been used to achieve vision-based robot navigation. However, the efficacy of these algorithms in environments with dynamic obstacles and high variation in the configuration space has not been thoroughly investigated. In this paper, we employ a deep Dyna-Q learning algorithm for room evacuation and obstacle avoidance in partially observable environments based on low-resolution raw image data from an onboard camera. We explore the performance of a robotic agent in environments containing no obstacles, convex obstacles, and concave obstacles, both static and dynamic. Obstacles and the exit are initialized in random positions at the start of each episode of reinforcement learning. Overall, we show that our algorithm and training approach can generalize learning for collision-free evacuation of environments with complex obstacle configurations. It is evident that the agent can navigate to a goal location while avoiding multiple static and dynamic obstacles, and can escape from a concave obstacle while searching for and navigating to the exit.
翻译:发展导航算法对于在迅速变化的危险环境中成功部署机器人至关重要,因为以前对配置的了解往往有限或没有这方面的知识。使用传统的路径规划算法是不可能做到的,因为传统的路径规划算法是以本地化为基础的,需要附有目标位置的详细障碍图。在这方面,基于视觉的算法很有希望,因为机载传感器上的机器人可以很容易地获得视觉信息,并且提供更丰富的信息来源,使深神经网络能够从中提取复杂的模式。深层强化学习已被用于实现基于视觉的机器人导航。然而,这些在充满动态障碍和配置空间高度变异的环境里,这些算法的功效没有得到彻底调查。在本文件中,我们使用深Dyna-Q学习算法,在低分辨率图像传感器传感器上提供的原始图像数据基础上,在部分可观察的环境中进行疏散和避免障碍。我们探索机器人代理人在没有任何障碍的环境中的性能,即静态和动态的机器人学习,在每次加固学习阶段开始随机定位。总体而言,我们用一个稳定的算法和培训方法,可以避免一个稳定的飞行,同时,在稳定、稳定的飞行过程中,可以普遍地研究一个稳定的轨道上,而可以避免一个稳定的飞行。