We revisit in this paper the discrete-time linear quadratic regulator (LQR) problem from the perspective of receding-horizon policy gradient (RHPG), a newly developed model-free learning framework for control applications. We provide a fine-grained sample complexity analysis for RHPG to learn a control policy that is both stabilizing and $\epsilon$-close to the optimal LQR solution, and our algorithm does not require knowing a stabilizing control policy for initialization. Combined with the recent application of RHPG in learning the Kalman filter, we demonstrate the general applicability of RHPG in linear control and estimation with streamlined analyses.
翻译:本文从滚动视角策略梯度 (RHPG) 的角度重新审视了离散时间线性二次调节器(LQR)问题。 我们为RHPG提供了精细的样本复杂度分析,以学习同时具有稳定性和$\epsilon $ -接近最优LQR解的控制策略,且我们的算法不需要知道初始化的稳定控制策略。结合RHPG在学习卡尔曼滤波器上的最新应用,我们展示了RHPG在线性控制和估计中的一般适用性,并提供了简化的分析方法。