The Linear-Quadratic Regulation (LQR) problem with unknown system parameters has been widely studied, but it has remained unclear whether $\tilde{ \mathcal{O}}(\sqrt{T})$ regret, which is the best known dependence on time, can be achieved almost surely. In this paper, we propose an adaptive LQR controller with almost surely $\tilde{ \mathcal{O}}(\sqrt{T})$ regret upper bound. The controller features a circuit-breaking mechanism, which circumvents potential safety breach and guarantees the convergence of the system parameter estimate, but is shown to be triggered only finitely often and hence has negligible effect on the asymptotic performance of the controller. The proposed controller is also validated via simulation on Tennessee Eastman Process~(TEP), a commonly used industrial process example.
翻译:在未知系统参数的情况下,线性二次调节(LQR)问题已经得到广泛研究,但仍然不清楚是否能够几乎肯定地实现$\tilde{ \mathcal{O}}(\sqrt{T})$遗憾上限,这是迄今为止对时间最好的已知依赖度。在本文中,我们提出了一种具有几乎必然的$\tilde{ \mathcal{O}}(\sqrt{T})$遗憾上界的自适应LQR控制器。该控制器具有断路器机制,可以避免潜在的安全风险并保证系统参数估计的收敛性,但是被证明只会被触发有限的次数,因此对控制器的渐近性能几乎没有影响。通过对田纳西东曼过程(Tennessee Eastman Process,简称TEP),一个常用的工业过程例子进行仿真,证明了所提出的控制器的有效性。