For safely applying reinforcement learning algorithms on high-dimensional nonlinear dynamical systems, a simplified system model is used to formulate a safe reinforcement learning framework. Based on the simplified system model, a low-dimensional representation of the safe region is identified and is used to provide safety estimates for learning algorithms. However, finding a satisfying simplified system model for complex dynamical systems usually requires a considerable amount of effort. To overcome this limitation, we propose in this work a general data-driven approach that is able to efficiently learn a low-dimensional representation of the safe region. Through an online adaptation method, the low-dimensional representation is updated by using the feedback data such that more accurate safety estimates are obtained. The performance of the proposed approach for identifying the low-dimensional representation of the safe region is demonstrated with a quadcopter example. The results show that, compared to previous work, a more reliable and representative low-dimensional representation of the safe region is derived, which then extends the applicability of the safe reinforcement learning framework.
翻译:为了安全地在高维非线性动态系统上应用强化学习算法,采用了简化系统模型来制定安全强化学习框架;根据简化系统模型,确定了安全区域的低维代表,并用于为学习算法提供安全估计;然而,为复杂的动态系统找到一个令人满意的简化系统模型通常需要大量努力;为了克服这一限制,我们在此工作中提议采用一种一般的数据驱动方法,能够有效地学习安全区域的一个低维代表法;通过在线适应方法,利用反馈数据更新低维代表法,从而获得更准确的安全估计;用四分法实例展示了拟议确定安全区域低维代表法的绩效;结果显示,与以往的工作相比,安全区域有一个更可靠和更具代表性的低维代表法,从而扩大了安全强化学习框架的适用性。