Lazy graph search algorithms are efficient at solving motion planning problems where edge evaluation is the computational bottleneck. These algorithms work by lazily computing the shortest potentially feasible path, evaluating edges along that path, and repeating until a feasible path is found. The order in which edges are selected is critical to minimizing the total number of edge evaluations: a good edge selector chooses edges that are not only likely to be invalid, but also eliminates future paths from consideration. We wish to learn such a selector by leveraging prior experience. We formulate this problem as a Markov Decision Process (MDP) on the state of the search problem. While solving this large MDP is generally intractable, we show that we can compute oracular selectors that can solve the MDP during training. With access to such oracles, we use imitation learning to find effective policies. If new search problems are sufficiently similar to problems solved during training, the learned policy will choose a good edge evaluation ordering and solve the motion planning problem quickly. We evaluate our algorithms on a wide range of 2D and 7D problems and show that the learned selector outperforms baseline commonly used heuristics. We further provide a novel theoretical analysis of lazy search in a Bayesian framework as well as regret guarantees on our imitation learning based approach to motion planning.
翻译:Lazy 图形搜索算法在解决边缘评估是计算瓶颈的动作规划问题方面是有效的。这些算法在计算最短的潜在可行路径、评估沿这条路径的边缘和重复直到找到可行路径。选择边缘的顺序对于最大限度地减少边缘评估的总数至关重要:良好的边缘选择者选择的边缘不仅可能无效,而且消除了未来的考虑路径。我们希望通过利用先前的经验来学习这样的选择者。我们把这个问题作为关于搜索问题的Markov 决策过程(MDP) 来进行。虽然解决这个大型的 MDP通常难以解决,但我们表明我们可以在培训期间进行可解决 MDP 的刻板选择者。在获得这种标志时,我们使用模拟学习来寻找有效的政策。如果新的搜索问题与培训期间解决的问题非常相似,那么,学习的政策将选择一种良好的边缘评估,并迅速解决运动规划问题。我们用一个2D和7D问题的广泛范围来评估我们的算法。我们用一个在2D和7D问题上的精选选择者超越了运动选择者选择者选择者选择者选择的路径,并显示在培训过程中可以进行一个常用的理论模型分析。