附有学习行为模型的树型政策规划</s> (Tree-structured Policy Planning with Learned Behavior Models)

Autonomous vehicles (AVs) need to reason about the multimodal behavior of neighboring agents while planning their own motion. Many existing trajectory planners seek a single trajectory that performs well under \emph{all} plausible futures simultaneously, ignoring bi-directional interactions and thus leading to overly conservative plans. Policy planning, whereby the ego agent plans a policy that reacts to the environment's multimodal behavior, is a promising direction as it can account for the action-reaction interactions between the AV and the environment. However, most existing policy planners do not scale to the complexity of real autonomous vehicle applications: they are either not compatible with modern deep learning prediction models, not interpretable, or not able to generate high quality trajectories. To fill this gap, we propose Tree Policy Planning (TPP), a policy planner that is compatible with state-of-the-art deep learning prediction models, generates multistage motion plans, and accounts for the influence of ego agent on the environment behavior. The key idea of TPP is to reduce the continuous optimization problem into a tractable discrete Markov Decision Process (MDP) through the construction of two tree structures: an ego trajectory tree for ego trajectory options, and a scenario tree for multi-modal ego-conditioned environment predictions. We demonstrate the efficacy of TPP in closed-loop simulations based on real-world nuScenes dataset and results show that TPP scales to realistic AV scenarios and significantly outperforms non-policy baselines.

翻译：自动车辆(AVs)需要了解邻国代理人的多式联运行为。许多现有的轨道规划者在规划自身运动时需要了解邻国代理人的多式联运行为。许多现有的轨道规划者寻求单一的轨迹,在\emph{all}合理的未来前景下运行良好,忽视双向互动,从而导致过度保守的计划。政策规划,即自我代理者计划一项对环境多式联运行为作出反应的政策,是一个充满希望的方向,因为它可以说明AV与环境之间的行动反应互动。然而,大多数现有政策规划者并没有将范围扩大到真正的自主车辆应用的复杂性:它们要么与现代深层次的学习预测模型不兼容,不能解释,或者无法产生高质量的轨迹。为了填补这一差距,我们提议制定树政策规划(TPP),这是一个符合最新水平的深层次学习预测模型的政策规划者,可以提出多阶段的动作计划,并解释利基动力动力动力动力动力剂对环境行为的影响。 TPP的关键想法是将连续的优化问题降为可伸缩的离式马科夫决定程序(MDP ), 要么与现代深层次的深度的预测模型相不兼容性预测模式不相容,, 或无法产生高质的轨轨轨。我们提议在两个树级的自我智能模型上展示的模型上展示的模型上展示以显示以立式的自我定位的自我定位的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型, 展示。</s>