We investigate and analyze principles of typical motion planning algorithms. These include traditional planning algorithms, supervised learning, optimal value reinforcement learning, policy gradient reinforcement learning. Traditional planning algorithms we investigated include graph search algorithms, sampling-based algorithms, and interpolating curve algorithms. Supervised learning algorithms include MSVM, LSTM, MCTS and CNN. Optimal value reinforcement learning algorithms include Q learning, DQN, double DQN, dueling DQN. Policy gradient algorithms include policy gradient method, actor-critic algorithm, A3C, A2C, DPG, DDPG, TRPO and PPO. New general criteria are also introduced to evaluate performance and application of motion planning algorithms by analytical comparisons. Convergence speed and stability of optimal value and policy gradient algorithms are specially analyzed. Future directions are presented analytically according to principles and analytical comparisons of motion planning algorithms. This paper provides researchers with a clear and comprehensive understanding about advantages, disadvantages, relationships, and future of motion planning algorithms in robotics, and paves ways for better motion planning algorithms.
翻译:我们调查并分析典型运动规划算法的原则,其中包括传统规划算法、监督学习、最佳增值强化学习、政策梯度强化学习。我们调查的传统规划算法包括图表搜索算法、抽样算法和内插曲线算法。受监督的学习算法包括MSVM、LSTM、MCTS和CNN。最佳增值强化学习算法包括Q学习、DQN、双重DQN、决断DQN。政策梯度算法包括政策梯度方法、行为者-捷克算法、A3C、A2C、DPG、DDPG、TRPO和PPPO。还采用了新的一般标准,通过分析比较来评价运动规划算法的性能和应用。对最佳价值的趋同速度和稳定性以及政策梯度算法进行了特别分析。根据运动规划算法的原则和分析比较提出了今后的方向。这份文件为研究人员提供了对机器人运动规划算法的优点、缺点、关系和前景的清楚和全面的了解,并为更好的运动规划算法铺路。