Safe reinforcement learning (RL) has achieved significant success on risk-sensitive tasks and shown promise in autonomous driving (AD) as well. Considering the distinctiveness of this community, efficient and reproducible baselines are still lacking for safe AD. In this paper, we release SafeRL-Kit to benchmark safe RL methods for AD-oriented tasks. Concretely, SafeRL-Kit contains several latest algorithms specific to zero-constraint-violation tasks, including Safety Layer, Recovery RL, off-policy Lagrangian method, and Feasible Actor-Critic. In addition to existing approaches, we propose a novel first-order method named Exact Penalty Optimization (EPO) and sufficiently demonstrate its capability in safe AD. All algorithms in SafeRL-Kit are implemented (i) under the off-policy setting, which improves sample efficiency and can better leverage past logs; (ii) with a unified learning framework, providing off-the-shelf interfaces for researchers to incorporate their domain-specific knowledge into fundamental safe RL methods. Conclusively, we conduct a comparative evaluation of the above algorithms in SafeRL-Kit and shed light on their efficacy for safe autonomous driving. The source code is available at \href{ https://github.com/zlr20/saferl_kit}{this https URL}.
翻译:安全强化学习(RL)在对风险敏感的任务中取得了显著成功,在自主驱动(AD)中也显示出了希望。考虑到这一社区的独特性,仍然缺乏安全AD的高效和可复制基线。在本文件中,我们释放安全RL-Kit(SafeRL-Kit)以基准安全RL(安全强化学习)方法。具体地说,SafeRL-Kit(SafeRL-Kit)包含一些针对零约束性任务的最新算法,包括安全层、回收RLLL、离政策拉格朗吉亚法和实用的Acreasiable Act(ADAD)方法。除了现有方法外,我们还提议了名为Exact Fact Application(EPOPOPO)的新型首级方法,并充分展示了其在安全ADAD方面的能力。安全RL-KI/Reqirrx(Safeal-L)系统安全源的比较性评估。