Recent rapid developments in reinforcement learning algorithms have been giving us novel possibilities in many fields. However, due to their exploring property, we have to take the risk into consideration when we apply those algorithms to safety-critical problems especially in real environments. In this study, we deal with a safe exploration problem in reinforcement learning under the existence of disturbance. We define the safety during learning as satisfaction of the constraint conditions explicitly defined in terms of the state and propose a safe exploration method that uses partial prior knowledge of a controlled object and disturbance. The proposed method assures the satisfaction of the explicit state constraints with a pre-specified probability even if the controlled object is exposed to a stochastic disturbance following a normal distribution. As theoretical results, we introduce sufficient conditions to construct conservative inputs not containing an exploring aspect used in the proposed method and prove that the safety in the above explained sense is guaranteed with the proposed method. Furthermore, we illustrate the validity and effectiveness of the proposed method through numerical simulations of an inverted pendulum and a four-bar parallel link robot manipulator.
翻译:最近强化学习算法的繁荣发展使我们在许多领域中有了新的可能性。然而,由于其探索性质,当我们将这些算法应用于安全关键问题时,特别是在真实环境中,我们必须考虑风险。在这项研究中,我们处理存在干扰下强化学习中的安全探索问题。我们将学习期间的安全定义为满足明确定义的状态约束条件,并提出了一种安全探索方法,该方法使用受控对象和干扰的部分先前知识。即使受控对象暴露于遵循正态分布的随机干扰之下,所提出的方法也可以确保满足显式状态约束的预定概率。作为理论结果,我们引入了构造不包含所提出的方法中使用的探索方面的保守输入的充分条件,并证明了采用所提出的方法在上述意义下保证安全性。此外,我们通过倒立摆和四杆并联连杆机器人的数值仿真来说明所提出方法的有效性和优越性。