In many learning based control methodologies, learning the unknown dynamic model precedes the control phase, while the aim is to control the system such that it remains in some safe region of the state space. In this work, our aim is to guarantee safety while learning and control proceed simultaneously. Specifically, we consider the problem of safe learning in nonlinear control-affine systems subject to unknown additive uncertainty. We first model the uncertainty as a Gaussian noise and use state measurements to learn its mean and covariance. We provide rigorous time-varying bounds on the mean and covariance of the uncertainty and employ them to modify the control input via an optimization program with potentially time-varying safety constraints. We show that with an arbitrarily large probability we can guarantee that the state will remain in the safe set, while learning and control are carried out simultaneously, provided that a feasible solution exists for the optimization problem. We provide a secondary formulation of this optimization that is computationally more efficient. This is based on tightening the safety constraints to counter the uncertainty about the learned mean and covariance. The magnitude of the tightening can be decreased as our confidence in the learned mean and covariance increases (i.e., as we gather more measurements about the environment). Extensions of the method are provided for non-Gaussian process noise with unknown mean and covariance as well as Gaussian uncertainties with state-dependent mean and covariance to accommodate more general environments.
翻译:在许多基于学习的控制方法中,学习未知的动态模型先于控制阶段,而目标是控制系统,使其留在州空间的某些安全区域。在这项工作中,我们的目标是保证安全,同时学习和控制同时进行。具体地说,我们考虑在非线性控制室动物系统中安全学习的问题,但有未知的添加性不确定性。我们首先将不确定性作为高斯噪音的模型,并使用州测量方法来了解其中值和共差。我们提供严格的时间差异界限,以了解不确定性的中值和共差,并使用它们通过优化方案修改控制输入,可能具有时间变化的安全限制。我们证明,在任意很大的概率情况下,我们可以保证国家将留在安全区,同时学习和控制,但前提是优化问题有可行的解决办法存在。我们首先将这种优化的第二位配方作为计算效率更高的标准。这是基于收紧安全限制,以对抗所学的中值和共变差的不确定性。 收紧程度可以随着我们所学的中平均值和共变差程度的增加而降低。我们收集了对普通中值和常值环境的信心,作为常值的增度的方法。