基于预测意识和强化学习的利他主义合作社驾驶 (Prediction-aware and Reinforcement Learning based Altruistic Cooperative Driving)

Autonomous vehicle (AV) navigation in the presence of Human-driven vehicles (HVs) is challenging, as HVs continuously update their policies in response to AVs. In order to navigate safely in the presence of complex AV-HV social interactions, the AVs must learn to predict these changes. Humans are capable of navigating such challenging social interaction settings because of their intrinsic knowledge about other agents behaviors and use that to forecast what might happen in the future. Inspired by humans, we provide our AVs the capability of anticipating future states and leveraging prediction in a cooperative reinforcement learning (RL) decision-making framework, to improve safety and robustness. In this paper, we propose an integration of two essential and earlier-presented components of AVs: social navigation and prediction. We formulate the AV decision-making process as a RL problem and seek to obtain optimal policies that produce socially beneficial results utilizing a prediction-aware planning and social-aware optimization RL framework. We also propose a Hybrid Predictive Network (HPN) that anticipates future observations. The HPN is used in a multi-step prediction chain to compute a window of predicted future observations to be used by the value function network (VFN). Finally, a safe VFN is trained to optimize a social utility using a sequence of previous and predicted observations, and a safety prioritizer is used to leverage the interpretable kinematic predictions to mask the unsafe actions, constraining the RL policy. We compare our prediction-aware AV to state-of-the-art solutions and demonstrate performance improvements in terms of efficiency and safety in multiple simulated scenarios.

翻译：在人类驱动的车辆(HV)在场的情况下,自动机动车辆(AV)导航具有挑战性,因为HV不断更新其应对AV的政策。为了在复杂的AV-HV社会互动的情况下安全导航,AV必须学会预测这些变化。人类能够驾驭这种具有挑战性的社会互动环境,因为他们对其它物剂行为的内在认识,并利用这种知识预测未来可能发生的情况。在人类的激励下,我们向AVs提供预测未来国家改进的能力,并在合作强化学习(RL)决策框架中利用预测,以提高安全和稳健性。在本文件中,我们提议将AVVS的两个早期基本组成部分(社会导航和预测)结合起来:我们把AVV决策过程设计成一个RL问题,并寻求获得最佳政策,利用预测-觉察到预测-觉察到未来观测结果(HPN),HPN观测在多步骤的预言中,使用了一个经过培训的RVFS-FS的预言-预言,用一个预言到一个预言的预言-预言-AFSF的预言-SV的预言,用一个预言的预言-SV-SL的预言-S-S-S-S-S-S-S-S-S-S-S-SL-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S