How to navigate effectively in crowd environments with socially acceptable standards remains the key problem to be solved for the development of mobile robots. Recent work has shown the effectiveness of deep reinforcement learning in addressing crowd navigation, but the learning becomes progressively less effective as the speed of pedestrians increases. To improve the effectiveness of deep reinforcement learning, we redesigned the reward function by introducing the penalty term of relative speed in the reward function. The newly designed reward function is tested on three mainstream deep reinforcement learning algorithms: deep reinforcement learning collision avoidance (CADRL), deep learning based long and short-term memory (LSTM RL), and reinforcement learning based on socialist riselection (SARL). The results of the experiments show that our model navigates in a safer way, outperforming the current model in key metrics such as success rate, collision rate, and hazard frequency.
翻译:如何以社会可接受的标准在人群环境中有效航行仍然是发展移动机器人需要解决的关键问题。最近的工作表明,在应对人群导航方面深层强化学习是有效的,但随着行人速度的加快,学习越来越不那么有效。为了提高深层强化学习的效果,我们重新设计了奖励功能,在奖励功能中引入了相对速度的惩罚条件。新设计的奖励功能在三种主流深度强化学习算法上进行了测试:深层强化学习避免碰撞(CADRL ), 深层学习基于长期和短期记忆(LSTM RL ), 以及基于社会主义风暴的强化学习(SARL ) 。 实验结果显示,我们的模型以更安全的方式导航,在成功率、碰撞率和危险频率等关键指标中超过了目前的模型。