狭小空间自我探索强化学习 (Reinforcement Learning for Self-exploration in Narrow Spaces)

In narrow spaces, motion planning based on the traditional hierarchical autonomous system could cause collisions due to mapping, localization, and control noises. Additionally, it is disabled when mapless. To tackle these problems, we leverage deep reinforcement learning which is verified to be effective in self-decision-making, to self-explore in narrow spaces without a map while avoiding collisions. Specifically, based on our Ackermann-steering rectangular-shaped ZebraT robot and its Gazebo simulator, we propose the rectangular safety region to represent states and detect collisions for rectangular-shaped robots, and a carefully crafted reward function for reinforcement learning that does not require the destination information. Then we benchmark five reinforcement learning algorithms including DDPG, DQN, SAC, PPO, and PPO-discrete, in a simulated narrow track. After training, the well-performed DDPG and DQN models can be transferred to three brand new simulated tracks, and furthermore to three real-world tracks.

翻译：在狭小的空间,基于传统等级自主系统的运动规划会因绘图、本地化和控制噪音而导致碰撞。此外,如果没有地图,它就会被禁用。为了解决这些问题,我们利用经证实在自我决策中有效的深强化学习,在没有地图的情况下在狭窄的空间进行自爆,同时避免碰撞。具体地,根据我们的阿克曼式矩形ZebraT机器人及其Gazebo模拟器,我们提议矩形安全区代表各州并探测矩形机器人的碰撞,以及精心设计的强化学习的奖励功能,不需要目的地信息。然后,我们设定五个强化学习算法的基准,包括DDPG、DQN、SAC、PO和PO-discrete,在模拟的窄轨中。经过培训,完善的DDPG和DQN模型可以转移到三个全新的模拟轨道,再转到三个真实世界轨道。