Autonomous vehicles (AVs) need to reason about the multimodal behavior of neighboring agents while planning their own motion. Many existing trajectory planners seek a single trajectory that performs well under \emph{all} plausible futures simultaneously, ignoring bi-directional interactions and thus leading to overly conservative plans. Policy planning, whereby the ego agent plans a policy that reacts to the environment's multimodal behavior, is a promising direction as it can account for the action-reaction interactions between the AV and the environment. However, most existing policy planners do not scale to the complexity of real autonomous vehicle applications: they are either not compatible with modern deep learning prediction models, not interpretable, or not able to generate high quality trajectories. To fill this gap, we propose Tree Policy Planning (TPP), a policy planner that is compatible with state-of-the-art deep learning prediction models, generates multistage motion plans, and accounts for the influence of ego agent on the environment behavior. The key idea of TPP is to reduce the continuous optimization problem into a tractable discrete MDP through the construction of two tree structures: an ego trajectory tree for ego trajectory options, and a scenario tree for multi-modal ego-conditioned environment predictions. We demonstrate the efficacy of TPP in closed-loop simulations based on real-world nuScenes dataset and results show that TPP scales to realistic AV scenarios and significantly outperforms non-policy baselines.
翻译:自主车辆(AVs)需要了解邻国代理人的多式联运行为。 许多现有的轨道规划者在规划自身运动时需要了解邻国代理人的多式联运行为。 许多现有轨迹规划者寻求一种单一轨迹,这种轨迹在\emph{all}合理的未来前景下运行良好,忽视双向互动,从而导致过度保守的计划。 政策规划,即自我代理者计划一项对环境多式联运行为作出反应的政策,是一个充满希望的方向,因为它可以说明AV与环境之间的行动反应互动。然而,大多数现有政策规划者并没有达到真正的自主车辆应用的复杂性:它们要么与现代深层次的学习预测模型不兼容,不易解释,或者无法产生高质量的轨迹。为了填补这一差距,我们提议制定树政策规划,即自我代理者规划者规划出一个符合最新水平的深层学习预测模型的政策,制定多阶段运动计划,并解释利己者动力动力动力动力动力因素对环境行为的影响。 TPPP的关键想法是将持续优化的问题降低到一个可伸缩的 MDP,通过建造两个树结构:利利的自我测底轨迹轨迹轨迹,以显示自我测距的自我测距。