Model information can be used to predict future trajectories, so it has huge potential to avoid dangerous region when implementing reinforcement learning (RL) on real-world tasks, like autonomous driving. However, existing studies mostly use model-free constrained RL, which causes inevitable constraint violations. This paper proposes a model-based feasibility enhancement technique of constrained RL, which enhances the feasibility of policy using generalized control barrier function (GCBF) defined on the distance to constraint boundary. By using the model information, the policy can be optimized safely without violating actual safety constraints, and the sample efficiency is increased. The major difficulty of infeasibility in solving the constrained policy gradient is handled by an adaptive coefficient mechanism. We evaluate the proposed method in both simulations and real vehicle experiments in a complex autonomous driving collision avoidance task. The proposed method achieves up to four times fewer constraint violations and converges 3.36 times faster than baseline constrained RL approaches.
翻译:模型信息可用于预测未来的轨迹,因此,在对现实世界的任务(如自主驾驶)进行强化学习时,它极有可能避免危险区域,然而,现有研究大多使用无模型限制的RL,造成不可避免的制约性违反;本文件建议采用基于模型的限用RL可行性增强技术,提高政策的可行性,使用通用控制屏障功能(GCBF),确定限制边界的距离;使用模型信息,可以安全地优化政策,同时不违反实际安全限制,抽样效率提高;解决受限制的政策梯度的不可行性主要困难由适应性系数机制处理;我们评估在复杂的自动驾驶避免碰撞任务中模拟和实际车辆实验中的拟议方法;拟议方法比受限制的RL方法要少四倍于普遍控制屏障功能(GCBF),比受限制基线限制的RL方法要快3.36倍。