Merging into the highway from the on-ramp is an essential scenario for automated driving. The decision-making under the scenario needs to balance the safety and efficiency performance to optimize a long-term objective, which is challenging due to the dynamic, stochastic, and adversarial characteristics. The Rule-based methods often lead to conservative driving on this task while the learning-based methods have difficulties meeting the safety requirements. In this paper, we propose an RL-based end-to-end decision-making method under a framework of offline training and online correction, called the Shielded Distributional Soft Actor-critic (SDSAC). The SDSAC adopts the policy evaluation with safety consideration and a safety shield parameterized with the barrier function in its offline training and online correction, respectively. These two measures support each other for better safety while not damaging the efficiency performance severely. We verify the SDSAC on an on-ramp merge scenario in simulation. The results show that the SDSAC has the best safety performance compared to baseline algorithms and achieves efficient driving simultaneously.
翻译:自动驾驶是自动驾驶的基本假设。根据设想,决策需要平衡安全和效率业绩,以优化长期目标,由于动态、随机性和对抗性特点,长期目标具有挑战性。基于规则的方法往往导致保守地推动这项任务,而基于学习的方法难以满足安全要求。在本文件中,我们提议在离线培训和网上校正的框架内,采用基于RL的端对端决策方法,称为 " 盾牌分配软件操作器-批评 " (SDSAC)。SDSAC采用政策评价时,考虑到安全因素,在离线培训和网上校正中,采用与屏障功能相对应的安全屏障屏障。这两项措施相互支持加强安全,同时不损害效率性能。我们核查SDSAC在模拟中以网上合并的假想情况。结果显示,SDSAC具有与基线算法相比的最佳安全性能,并同时实现高效驾驶。