This paper considers the data-driven linear-quadratic regulation (LQR) problem where the system parameters are unknown and need to be identified in real time. Contrary to existing system identification and data-driven control methods, which typically require either offline data collection or multiple resets, we propose an online non-episodic algorithm that gains knowledge about the system from a single trajectory. The algorithm guarantees that both the identification error and the suboptimality gap of control performance in this trajectory converge to zero almost surely. Furthermore, we characterize the almost sure convergence rates of identification and control, and reveal an optimal trade-off between exploration and exploitation. We provide a numerical example to illustrate the effectiveness of our proposed strategy.
翻译:本文考虑了数据驱动线性赤道调节(LQR)问题,其中系统参数未知,需要实时确定。与现有的系统识别和数据驱动控制方法(通常需要离线数据收集或多留置)相反,我们提议采用在线非假设算法,从单一轨迹中获取对系统的了解。算法保证在这一轨迹中,识别错误和控制性能的次优性差距几乎肯定会达到零。此外,我们将识别和控制的几乎肯定的趋同率定性为特征,并揭示勘探与开发之间的最佳权衡。我们提供了一个数字例子,以说明我们拟议战略的有效性。