In many learning based control methodologies, learning the unknown dynamic model precedes the control phase, while the aim is to control the system such that it remains in some safe region of the state space. In this work our aim is to guarantee safety while learning and control proceed simultaneously. Specifically, we consider the problem of safe learning in nonlinear control-affine systems subject to unknown additive uncertainty. We model uncertainty as a Gaussian signal and use state measurements to learn its mean and covariance. We provide rigorous time-varying bounds on the mean and covariance of the uncertainty and employ them to modify the control input via an optimisation program with safety constraints encoded as a barrier function on the state space. We show that with an arbitrarily large probability we can guarantee that the state will remain in the safe set, while learning and control are carried out simultaneously, provided that a feasible solution exists for the optimisation problem. We provide a secondary formulation of this optimisation that is computationally more efficient. This is based on tightening the safety constraints to counter the uncertainty about the learned mean and covariance. The magnitude of the tightening can be decreased as our confidence in the learned mean and covariance increases (i.e., as we gather more measurements about the environment). Extensions of the method are provided for Gaussian uncertainties with piecewise constant mean and covariance to accommodate more general environments.
翻译:在许多基于学习的控制方法中,学习未知的动态模型先于控制阶段,而目标是控制系统,使之保留在州空间的某些安全区域。在这项工作中,我们的目标是在学习和控制同时进行的同时保障安全。具体地说,我们考虑在非线性控制室动物系统中安全学习的问题,但有未知的添加性不确定性。我们以高斯信号为模型,利用州测量方法来学习其中性和共性。我们对不确定性的平均值和变异性提供严格的时间分配界限,并使用它们通过优化程序修改控制输入,其中将安全限制编码为州空间的屏障功能。我们证明,在任意很大的概率情况下,我们可以保证国家将留在安全区,同时进行学习和控制,但前提是对优化问题存在可行的解决办法。我们对这种选择提出了一种二次表述,这种选择在计算上效率更高。这是基于收紧安全限制,以抵消所学平均值和变异性的不确定性。收紧程度的强度可以随着我们所学的常态环境的不断更新而降低。我们所学到的常态环境的常态和常态的不断更新。