When learning policies for robotic systems from data, safety is a major concern, as violation of safety constraints may cause hardware damage. SafeOpt is an efficient Bayesian optimization (BO) algorithm that can learn policies while guaranteeing safety with high probability. However, its search space is limited to an initially given safe region. We extend this method by exploring outside the initial safe area while still guaranteeing safety with high probability. This is achieved by learning a set of initial conditions from which we can recover safely using a learned backup controller in case of a potential failure. We derive conditions for guaranteed convergence to the global optimum and validate GoSafe in hardware experiments.
翻译:当从数据中学习机器人系统的政策时,安全是一个主要关切问题,因为违反安全限制可能造成硬件损坏。安全Opt是一种高效的贝叶斯优化算法,可以学习政策,同时极有可能保证安全。然而,它的搜索空间仅限于最初给定的安全区域。我们通过在初始安全区之外探索,同时仍然非常有可能保证安全,扩大这一方法的范围。这是通过学习一套初始条件实现的,在可能发生故障时,我们可以利用一个学习的后备控制器安全地恢复。我们为保证与全球最佳安全区接轨和在硬件实验中验证GoSafe的条件。