Interaction-aware planning for autonomous driving requires an exploration of a combinatorial solution space when using conventional search- or optimization-based motion planners. With Deep Reinforcement Learning, optimal driving strategies for such problems can be derived also for higher-dimensional problems. However, these methods guarantee optimality of the resulting policy only in a statistical sense, which impedes their usage in safety critical systems, such as autonomous vehicles. Thus, we propose the Experience-Based-Heuristic-Search algorithm, which overcomes the statistical failure rate of a Deep-reinforcement-learning-based planner and still benefits computationally from the pre-learned optimal policy. Specifically, we show how experiences in the form of a Deep Q-Network can be integrated as heuristic into a heuristic search algorithm. We benchmark our algorithm in the field of path planning in semi-structured valet parking scenarios. There, we analyze the accuracy of such estimates and demonstrate the computational advantages and robustness of our method. Our method may encourage further investigation of the applicability of reinforcement-learning-based planning in the field of self-driving vehicles.
翻译:利用传统的基于搜索或优化的机动规划人员,需要探索组合式解决方案空间。在深强化学习中,这些问题的最佳驱动战略也可以产生用于更高层面的问题。然而,这些方法只能保证由此产生的政策在统计意义上的最佳性,这妨碍了其在安全关键系统中的使用,如自主车辆。因此,我们提出基于经验的休养-搜索算法,该算法克服了基于深增援的基于学习的规划人员的统计失灵率,并且仍然能从事先获得的最佳政策中得到计算的好处。具体地说,我们展示了如何将深Q网络形式的经验作为超额工作纳入超大型搜索算法。我们用半结构的公用车泊车模式在路径规划领域衡量我们的算法。在那里,我们分析了这种估算的准确性,并展示了我们方法的计算优势和稳健性。我们的方法可能鼓励进一步调查在自行驾驶车辆领域进行强化学习规划的可行性。