Uncertainty on human behaviors poses a significant challenge to autonomous driving in crowded urban environments. The partially observable Markov decision processes (POMDPs) offer a principled framework for planning under uncertainty, often leveraging Monte Carlo sampling to achieve online performance for complex tasks. However, sampling also raises safety concerns by potentially missing critical events. To address this, we propose a new algorithm, LEarning Attention over Driving bEhavioRs (LEADER), that learns to attend to critical human behaviors during planning. LEADER learns a neural network generator to provide attention over human behaviors in real-time situations. It integrates the attention into a belief-space planner, using importance sampling to bias reasoning towards critical events. To train the algorithm, we let the attention generator and the planner form a min-max game. By solving the min-max game, LEADER learns to perform risk-aware planning without human labeling.
翻译:人类行为的不确定性对拥挤的城市环境中的自主驾驶提出了重大挑战。部分可见的Markov决策程序(POMDPs)提供了一个在不确定情况下进行规划的原则框架,常常利用Monte Carlo取样实现复杂任务的在线性能。然而,抽样也因可能缺失的重大事件而引起安全问题。为了解决这个问题,我们提议了一种新的算法,即对驾驶behaviors(LEADERs)的注意力转移,在规划期间学会关注关键的人类行为。LEADER学会了神经网络生成器,以关注实时情况下的人类行为。它将注意力纳入一个信仰空间规划器,利用重要取样对关键事件进行偏差推理。为了培训算法,我们让注意力产生器和规划器形成一个微轴游戏。通过解决微轴游戏,LEADER学会在没有人类标签的情况下进行风险意识规划。