Autonomous driving has been at the forefront of public interest, and a pivotal debate to widespread concerns is safety in the transportation system. Deep reinforcement learning (DRL) has been applied to autonomous driving to provide solutions for obstacle avoidance. However, in a road traffic junction scenario, the vehicle typically receives partial observations from the transportation environment, while DRL needs to rely on long-term rewards to train a reliable model by maximising the cumulative rewards, which may take the risk when exploring new actions and returning either a positive reward or a penalty in the case of collisions. Although safety concerns are usually considered in the design of a reward function, they are not fully considered as the critical metric to directly evaluate the effectiveness of DRL algorithms in autonomous driving. In this study, we evaluated the safety performance of three baseline DRL models (DQN, A2C, and PPO) and proposed a self-awareness module from an attention mechanism for DRL to improve the safety evaluation for an anomalous vehicle in a complex road traffic junction environment, such as intersection and roundabout scenarios, based on four metrics: collision rate, success rate, freezing rate, and total reward. Our two experimental results in the training and testing phases revealed the baseline DRL with poor safety performance, while our proposed self-awareness attention-DQN can significantly improve the safety performance in intersection and roundabout scenarios.
翻译:自主驾驶一直处于公众关注的最前沿,而引起广泛关注的一个关键辩论是运输系统的安全; 深强化学习(DRL)已应用于自主驾驶,以提供避免障碍的解决方案; 然而,在道路交通交叉情况下,车辆通常会从运输环境得到部分观察,而DRL需要依靠长期奖励来培训可靠的模式,办法是最大限度地扩大累积奖励,这在探索新行动和在发生碰撞时返回正面奖励或惩罚时可能带来风险; 虽然安全关切通常在设计奖励功能时被考虑,但并未被充分视为直接评价DRL算法在自主驾驶方面效力的关键衡量标准; 在这项研究中,我们评估了三个DRL基线模型(DQN、A2C和PPO)的安全性表现,并从DR关注机制中提出一个自我意识单元,以改进交通交通连接环境复杂时的异常车辆的安全评价,例如交错和交错假设情景,但根据四个计量标准:碰撞率、成功率、冻结率、D级算法,以及我们提出的两个业绩测试阶段的自我意识,同时在测试中大大改进了我们的基本安全性水平。