Vanilla Reinforcement Learning (RL) can efficiently solve complex tasks but does not provide any guarantees on system behavior. Yet, for real-world systems, which are often safety-critical, such guarantees on safety specifications are necessary. To bridge this gap, we propose a safe RL procedure for continuous action spaces with verified probabilistic guarantees specified via temporal logic. First, our approach probabilistically verifies a candidate controller with respect to a temporal logic specification while randomizing the controller's inputs within an expansion set. Then, we use RL to improve the performance of this probabilistically verified controller and explore in the given expansion set around the controller's input. Finally, we calculate probabilistic safety guarantees with respect to temporal logic specifications for the learned agent. Our approach is efficiently implementable for continuous action and state spaces and separates safety verification and performance improvement into two distinct steps. We evaluate our approach on an evasion task where a robot has to reach a goal while evading a dynamic obstacle with a specific maneuver. Our results show that our safe RL approach leads to efficient learning while probablistically maintaining safety specifications.
翻译:暂无翻译