In the last decade, reinforcement learning successfully solved complex control tasks and decision-making problems, like the Go board game. Yet, there are few success stories when it comes to deploying those algorithms to real-world scenarios. One of the reasons is the lack of guarantees when dealing with and avoiding unsafe states, a fundamental requirement in critical control engineering systems. In this paper, we introduce Guided Safe Shooting (GuSS), a model-based RL approach that can learn to control systems with minimal violations of the safety constraints. The model is learned on the data collected during the operation of the system in an iterated batch fashion, and is then used to plan for the best action to perform at each time step. We propose three different safe planners, one based on a simple random shooting strategy and two based on MAP-Elites, a more advanced divergent-search algorithm. Experiments show that these planners help the learning agent avoid unsafe situations while maximally exploring the state space, a necessary aspect when learning an accurate model of the system. Furthermore, compared to model-free approaches, learning a model allows GuSS reducing the number of interactions with the real-system while still reaching high rewards, a fundamental requirement when handling engineering systems.
翻译:在过去十年里,强化学习成功地解决了复杂的控制任务和决策问题,比如 Go 棋盘游戏。然而,在将这些算法应用到现实世界情景时,成功的例子很少。原因之一是在处理和避免不安全状态时缺乏保障,这是关键控制工程系统的一个基本要求。在本文中,我们引入了一种基于模型的“引导安全射击”(GuSS)方法,它可以学习如何控制系统,尽量减少违反安全限制的情况。该模型是在系统运行过程中以循环组合方式收集的数据中学习的,然后用于规划每个步骤的最佳行动。我们建议了三个不同的安全规划者,一个基于简单的随机射击战略,两个基于MAP-Elites,一个更先进的不同研究算法。实验表明,这些规划者帮助学习代理人避免不安全的情况,同时最充分地探索国家空间,这是学习系统准确模型时一个必要的方面。此外,与无模型方法相比,学习模型可以减少与实际系统的互动次数,同时达到高额回报,这是处理工程系统的基本要求。