A fundamental challenge in learning to control an unknown dynamical system is to reduce model uncertainty by making measurements while maintaining safety. In this work, we formulate a mathematical definition of what it means to safely learn a dynamical system by sequentially deciding where to initialize the next trajectory. In our framework, the state of the system is required to stay within a given safety region under the (possibly repeated) action of all dynamical systems that are consistent with the information gathered so far. For our first two results, we consider the setting of safely learning linear dynamics. We present a linear programming-based algorithm that either safely recovers the true dynamics from trajectories of length one, or certifies that safe learning is impossible. We also give an efficient semidefinite representation of the set of initial conditions whose resulting trajectories of length two are guaranteed to stay in the safety region. For our final result, we study the problem of safely learning a nonlinear dynamical system. We give a second-order cone programming based representation of the set of initial conditions that are guaranteed to remain in the safety region after one application of the system dynamics.
翻译:在学会控制未知动态系统方面,一个根本的挑战是如何通过测量来减少模型不确定性,同时保持安全。在这项工作中,我们为安全地学习动态系统的含义制定了一个数学定义,通过顺序决定如何启动下一个轨迹,安全地学习动态系统。在我们的框架内,该系统的状况需要在所有动态系统与迄今收集的信息相一致的安全区域(可能重复)行动(可能重复)下,留在特定的安全区域。关于头两个结果,我们考虑安全学习线性动态的设置。我们提出了一个线性基于编程的算法,要么从长长的轨迹上安全恢复真实动态,要么证明不可能安全学习。我们还对由此导致的两长的轨迹保证留在安全区域的一系列初始条件作了有效的半限定性说明。关于我们的最后结果,我们研究了安全学习非线性动态系统的问题。我们给出了基于第二阶次线性线性编程的一套初始条件的描述,保证在系统动态应用系统后将留在安全区域。