Safety has been recognized as the central obstacle to preventing the use of reinforcement learning (RL) for real-world applications. Different methods have been developed to deal with safety concerns in RL. However, learning reliable RL-based solutions usually require a large number of interactions with the environment. Likewise, how to improve the learning efficiency, specifically, how to utilize transfer learning for safe reinforcement learning, has not been well studied. In this work, we propose an adaptive aggregation framework for safety-critical control. Our method comprises two key techniques: 1) we learn to transfer the safety knowledge by aggregating the multiple source tasks and a target task through the attention network; 2) we separate the goal of improving task performance and reducing constraint violations by utilizing a safeguard. Experiment results demonstrate that our algorithm can achieve fewer safety violations while showing better data efficiency compared with several baselines.
翻译:安全已被公认为防止在现实世界应用中使用强化学习(RL)的核心障碍; 已经制定了不同的方法来处理在RL中的安全关切。然而,学习可靠的基于RL的解决方案通常需要与环境进行大量互动; 同样,如何提高学习效率,特别是如何利用转让学习促进安全强化学习,尚未进行深入研究。在这项工作中,我们建议为安全关键控制建立一个适应性综合框架。我们的方法包括两个关键技术:(1) 我们学习通过汇集多种来源任务和通过关注网络完成目标任务来转移安全知识;(2) 我们将改进任务绩效的目标与利用保障来减少违反限制的目标分开。实验结果表明,我们的算法可以减少违反安全的情况,同时显示与若干基线相比数据效率更高。