We develop a learning-based control algorithm for unknown dynamical systems under very severe data limitations. Specifically, the algorithm has access to streaming and noisy data only from a single and ongoing trial. It accomplishes such performance by effectively leveraging various forms of side information on the dynamics to reduce the sample complexity. Such side information typically comes from elementary laws of physics and qualitative properties of the system. More precisely, the algorithm approximately solves an optimal control problem encoding the system's desired behavior. To this end, it constructs and iteratively refines a data-driven differential inclusion that contains the unknown vector field of the dynamics. The differential inclusion, used in an interval Taylor-based method, enables to over-approximate the set of states the system may reach. Theoretically, we establish a bound on the suboptimality of the approximate solution with respect to the optimal control with known dynamics. We show that the longer the trial or the more side information is available, the tighter the bound. Empirically, experiments in a high-fidelity F-16 aircraft simulator and MuJoCo's environments illustrate that, despite the scarcity of data, the algorithm can provide performance comparable to reinforcement learning algorithms trained over millions of environment interactions. Besides, we show that the algorithm outperforms existing techniques combining system identification and model predictive control.
翻译:具体地说, 算法只能从单一的和持续的试验中获取流态和噪音数据。 它通过有效地利用关于动态的多种侧面信息来降低样本复杂性, 实现了这种性能。 这种侧面信息通常来自物理和系统质量特性的基本定律。 更准确地说, 算法可以解决系统理想行为中的最佳控制问题。 为此, 它构建并迭接地完善了包含动态中未知矢量场的数据驱动差异包容。 以泰勒为基础的间隔方法使用的差异包容使得系统可能达到的状态过近。 从理论上讲, 我们定定定了一个与已知动态的最佳控制相比, 近似解决方案的亚优度比性。 我们显示试验时间越长, 或越侧面信息越能编码系统所期望的行为。 为了这个目的, 它建构并迭接地完善了包含动态中未知的矢量场的数据驱动器。 以泰勒为根据的间隔方法使用的差异包容使系统能够超近其可能达到的状态。 从理论上说, 我们的算法方法可以提供可比较的演算法的演算方法。