Deep reinforcement learning algorithms (DRL) are increasingly being used in safety-critical systems. Ensuring the safety of DRL agents is a critical concern in such contexts. However, relying solely on testing is not sufficient to ensure safety as it does not offer guarantees. Building safety monitors is one solution to alleviate this challenge. This paper proposes SMARLA, a machine learning-based safety monitoring approach designed for DRL agents. For practical reasons, SMARLA is agnostic to the type of DRL agent's inputs. Further, it is designed to be black-box (as it does not require access to the internals or training data of the agent) by leveraging state abstraction to facilitate the learning of safety violation prediction models from the agent's states using a reduced state space. We quantitatively and qualitatively validated SMARLA on three well-known RL case studies. Empirical results reveal that SMARLA achieves accurate violation prediction with a low false positive rate and can predict safety violations at an early stage, approximately halfway through the execution of the agent, before violations occur.
翻译:暂无翻译