Optimal trajectory planning involves obstacles avoidance in which path planning is the key to success in optimal trajectory planning. Due to the computational demands, most of the path planning algorithms can not be employed for real-time based applications. Model-based Reinforcement Learning approaches for path planning got certain success in the recent past. Yet, most of such approaches do not have deterministic output due to the nature of those approaches. We analyzed several types of reinforcement learning-based approaches for path planning. One of them is a deterministic tree-based approach and the other two approaches are based on Q-learning and approximate policy gradient, respectively. We tested preceding approaches on two different type of simulators. Each of which consists of a set of random obstacles which could be changed or moved dynamically. After analysing the result and computation time, we concluded that the deterministic tree search approach provides a highly accurate result. However, the computational time is considerably higher than the other two approaches. Finally, the comparative results are provided in terms of accuracy and computational time as evidence.
翻译:最佳轨迹规划涉及避免障碍,因为路径规划是最佳轨迹规划成功的关键。由于计算要求,大多数路径规划算法无法用于实时应用程序。基于模型的路径规划强化学习方法最近取得了一定的成功。然而,由于这些方法的性质,大多数此类方法没有决定性产出。我们分析了几种基于强化学习的方法,用于路径规划。其中一种是基于树的确定性方法,其他两种方法分别以Q-学习和近似政策梯度为基础。我们用两种不同的模拟器测试了之前的方法。每一种方法都由一系列随机障碍组成,这些障碍可以动态地改变或移动。在分析结果和计算时间后,我们得出结论,确定性树搜索方法提供了非常准确的结果。然而,计算时间大大高于其他两种方法。最后,比较结果以准确性和计算时间作为证据提供。