Reliable pedestrian crash avoidance mitigation (PCAM) systems are crucial components of safe autonomous vehicles (AVs). The nature of the vehicle-pedestrian interaction where decisions of one agent directly affect the other agent's optimal behavior, and vice versa, is a challenging yet often neglected aspect of such systems. We address this issue by modeling a Markov decision process (MDP) for a simulated AV-pedestrian interaction at an unmarked crosswalk. The AV's PCAM decision policy is learned through deep reinforcement learning (DRL). Since modeling pedestrians realistically is challenging, we compare two levels of intelligent pedestrian behavior. While the baseline model follows a predefined strategy, our advanced pedestrian model is defined as a second DRL agent. This model captures continuous learning and the uncertainty inherent in human behavior, making the AV-pedestrian interaction a deep multi-agent reinforcement learning (DMARL) problem. We benchmark the developed PCAM systems according to the collision rate and the resulting traffic flow efficiency with a focus on the influence of observation uncertainty on the decision-making of the agents. The results show that the AV is able to completely mitigate collisions under the majority of the investigated conditions and that the DRL pedestrian model learns an intelligent crossing behavior.
翻译:可靠的避免行人碰撞系统(PCAM)是安全自主车辆(AV)的关键组成部分。 车辆节能互动的性质,即一个代理商的决定直接影响到另一代理商的最佳行为,反之亦然,是这类系统的一个具有挑战性但往往被忽视的方面。 我们通过模拟一个模拟的AV节能互动的Markov决策程序(MDP)来解决该问题。 AV的ACM决定政策是通过深层强化学习(DRL)来学习的。由于模拟行人现实具有挑战性,我们比较了两种水平的智能行人行为。虽然基线模型遵循预先确定的战略,但我们的先进行人模型被定义为第二个DRL代理商。这一模型捕捉了人类行为中固有的持续学习和不确定性,使AV节能互动成为一个深层多试剂强化学习(DMARL)问题。我们根据碰撞率和由此产生的交通流量效率将开发的CCM系统作为基准,重点是观测对代理人决策的不确定性的影响。结果显示,AV模型能够完全减轻在AV所调查的多数行进中学习的智能行为。