This study proposes a safe and sample-efficient reinforcement learning (RL) framework to address two major challenges in developing applicable RL algorithms: satisfying safety constraints and efficiently learning with limited samples. To guarantee safety in real-world complex environments, we use the safe set algorithm (SSA) to monitor and modify the nominal controls, and evaluate SSA+RL in a clustered dynamic environment which is challenging to be solved by existing RL algorithms. However, the SSA+RL framework is usually not sample-efficient especially in reward-sparse environments, which has not been addressed in previous safe RL works. To improve the learning efficiency, we propose three techniques: (1) avoiding behaving overly conservative by adapting the SSA; (2) encouraging safe exploration using random network distillation with safety constraints; (3) improving policy convergence by treating SSA as expert demonstrations and directly learn from that. The experimental results show that our framework can achieve better safety performance compare to other safe RL methods during training and solve the task with substantially fewer episodes. Project website: https://hychen-naza.github.io/projects/Safe_RL/.
翻译:本研究提出了一种安全且样本高效的强化学习(RL)框架,以解决开发适用的RL算法时的两个主要挑战:满足安全限制和在有限样本下进行高效学习。为确保真实世界的复杂环境安全,我们使用安全集算法(SSA)来监控和修改标称控制,并在聚类动态环境中评估SSA + RL,这对于现有RL算法来说很具有挑战性。然而,SSA + RL框架通常在奖励稀疏环境下不够高效,这还没有在以前的安全RL工作中得到解决。为了改进学习效率,我们提出了三种技术:(1)通过调整SSA来避免过度保守;(2)使用具有安全约束的随机网络提炼来鼓励安全勘探;(3)将SSA视为专家演示并直接从中学习,以改进策略收敛性。实验结果表明,我们的框架可以在训练过程中实现比其他安全RL方法更好的安全性能,并在显著减少的情况下解决任务的要求。项目网站:https://hychen-naza.github.io/projects/Safe_RL/.