Intelligent robots provide a new insight into efficiency improvement in industrial and service scenarios to replace human labor. However, these scenarios include dense and dynamic obstacles that make motion planning of robots challenging. Traditional algorithms like A* can plan collision-free trajectories in static environment, but their performance degrades and computational cost increases steeply in dense and dynamic scenarios. Optimal-value reinforcement learning algorithms (RL) can address these problems but suffer slow speed and instability in network convergence. Network of policy gradient RL converge fast in Atari games where action is discrete and finite, but few works have been done to address problems where continuous actions and large action space are required. In this paper, we modify existing advantage actor-critic algorithm and suit it to complex motion planning, therefore optimal speeds and directions of robot are generated. Experimental results demonstrate that our algorithm converges faster and stable than optimal-value RL. It achieves higher success rate in motion planning with lesser processing time for robot to reach its goal.
翻译:智能机器人提供了对工业和服务情景效率提高的新的洞察力,以取代人类劳动力。然而,这些情景包括使机器人运动规划具有挑战性的密集和动态障碍。A* 等传统算法可以在静态环境中规划无碰撞轨迹,但其性能在密集和动态情景中急剧退化和计算成本急剧上升。最佳值强化学习算法(RL)可以解决这些问题,但在网络趋同方面速度缓慢且不稳定。政策梯度RL网络在Atari游戏中快速聚集,因为那里的行动是分散的和有限的,但是在解决需要持续行动和大规模行动空间的问题方面却做了很少的工作。在本文件中,我们修改现有优势的行为体-轨迹算法,使之适应复杂的运动规划,从而产生最佳速度和机器人方向。实验结果表明,我们的算法比最优值RL更快和稳定地结合。它实现了更高的成功率,在机器人实现目标的处理时间较少的情况下,在运动规划中实现了更高的成功率。