Reliable pedestrian crash avoidance mitigation (PCAM) systems are crucial components of safe autonomous vehicles (AVs). The nature of the vehicle-pedestrian interaction where decisions of one agent directly affect the other agent's optimal behavior, and vice versa, is a challenging yet often neglected aspect of such systems. We address this issue by defining a Markov decision process (MDP) for a simulated driving scenario in which an AV driving along an urban street faces a pedestrian trying to cross an unmarked crosswalk. The AV's PCAM decision policy is learned through deep reinforcement learning (DRL). Since modeling pedestrians realistically is challenging, we compare two levels of intelligent pedestrian behavior. While the baseline model follows a predefined strategy, our advanced pedestrian model is defined as a second DRL agent. This model captures continuous learning and the uncertainty inherent in human behavior, making the vehicle-pedestrian interaction a deep multi-agent reinforcement learning (DMARL) problem. We benchmark the developed PCAM systems according to the agents' collision rate and the resulting traffic flow efficiency with a focus on the influence of observation uncertainty or noise on the decision-making of the agents. The results show that the AV is able to completely mitigate collisions under the majority of the investigated conditions, and that the DRL pedestrian model indeed learns a more intelligent crossing behavior.
翻译:可靠的避免行人撞车系统是安全自主车辆(AV)的关键组成部分。 车辆节能互动的性质是一个直接影响到另一代理的最佳行为,反之亦然。 车辆节能互动的性质是一个挑战性但往往被忽视的系统。 我们通过定义一个模拟驾驶场的Markov决策程序(MDP)来解决这一问题,模拟驾驶在城市街道上的AV遇到一个行人试图跨过无标志十字路口的行人。AV的ACM决定政策是通过深层加固学习(DRL)来学习的。由于模拟行人现实具有挑战性,我们比较了智能行人行为的两个层次。虽然基线模型遵循了预先确定的战略,但我们的先进行人模型被定义为第二个DRL代理。这个模型捕捉了持续学习和人类行为固有的不确定性,使车辆节能互动成为一个深度多剂强化学习(DMARL)问题。 我们根据代理人的碰撞率和由此形成的交通流量效率对已开发的CPM系统进行基准,重点是观测对代理人决策的不确定性或噪音的影响。 高级行人行人行为模型能够完全减少风险。