Recent rapid developments in reinforcement learning algorithms have been giving us novel possibilities in many fields. However, due to their exploring property, we have to take the risk into consideration when we apply those algorithms to safety-critical problems especially in real environments. In this study, we deal with a safe exploration problem in reinforcement learning under the existence of disturbance. We define the safety during learning as satisfaction of the constraint conditions explicitly defined in terms of the state and propose a safe exploration method that uses partial prior knowledge of a controlled object and disturbance. The proposed method assures the satisfaction of the explicit state constraints with a pre-specified probability even if the controlled object is exposed to a stochastic disturbance following a normal distribution. As theoretical results, we introduce sufficient conditions to construct conservative inputs not containing an exploring aspect used in the proposed method and prove that the safety in the above explained sense is guaranteed with the proposed method. Furthermore, we illustrate the validity and effectiveness of the proposed method through numerical simulations of an inverted pendulum and a four-bar parallel link robot manipulator.
翻译:近期在强化学习算法方面的快速发展使我们在许多领域有了新的可能性。然而,由于这些算法的探索性财产,我们必须在将这些算法应用于安全关键问题时考虑风险。在本研究中,我们处理在动乱存在的情况下加强学习的安全探索问题。我们把学习期间的安全定义为满足国家明确规定的制约条件,并提议一种安全探索方法,使用对受控物体和扰动的部分事先知识。拟议方法保证明确状态限制与事先确定的概率得到满足,即使受控物体在正常分布后暴露在沙沙沙扰下。作为理论结果,我们引入了充分的条件,以构建保守的投入,而不包含拟议方法中使用的探索方面,并证明拟议方法保证了上述解释的安全。此外,我们通过对倒转式弹管和四巴平行链接机器人操纵器进行数字模拟,说明拟议方法的有效性和有效性。