受任务规格限制的非光度神经适应性控制 (Non-Parametric Neuro-Adaptive Control Subject to Task Specifications)

We develop a learning-based algorithm for the control of autonomous systems governed by unknown, nonlinear dynamics to satisfy user-specified spatio-temporal tasks expressed as signal temporal logic specifications. Most existing algorithms either assume certain parametric forms for the unknown dynamic terms or resort to unnecessarily large control inputs in order to provide theoretical guarantees. The proposed algorithm addresses these drawbacks by integrating neural-network-based learning with adaptive control. More specifically, the algorithm learns a controller, represented as a neural network, using training data that correspond to a collection of system parameters and tasks. These parameters and tasks are derived by varying the nominal parameters and the spatio-temporal constraints of the user-specified task, respectively. It then incorporates this neural network into an online closed-form adaptive control policy in such a way that the resulting behavior satisfies the user-defined task. The proposed algorithm does not use any a priori information on the unknown dynamic terms or any approximation schemes. We provide formal theoretical guarantees on the satisfaction of the task. Numerical experiments on a robotic manipulator and a unicycle robot demonstrate that the proposed algorithm guarantees the satisfaction of 50 user-defined tasks, and outperforms control policies that do not employ online adaptation or the neural-network controller. Finally, we show that the proposed algorithm achieves greater performance than standard reinforcement-learning algorithms in the pendulum benchmarking environment.

翻译：我们开发了一种基于学习的算法,用于控制由未知的非线性动态调节的自主系统,以满足用户指定的时空任务,这些参数和任务以信号时间逻辑规格表示,大多数现有的算法要么对未知动态条件采取某些参数形式,要么采用不必要大型控制投入来提供理论保证。提议的算法通过将神经网络学习与适应性控制相结合,解决这些缺陷。更具体地说,算法学习一个控制器,作为神经网络,使用与系统参数和任务汇编相对应的培训数据。这些参数和任务分别来自用户任务的名义参数和空间时空限制的不同。然后,将这一神经网络纳入在线封闭式适应控制政策,使由此产生的行为能够满足用户定义的任务。提议的算法没有使用任何关于未知动态条件或任何近似性计划的先验信息。我们为任务的满意度提供了正式的理论保证。机器人操纵器和单周期机器人的数值实验表明,拟议的算法网络将保证50个用户-定型逻辑环境的满意度,我们最后定义的算法化的逻辑环境将比我们最后定义的算法要求更精确的逻辑环境的升级。