This paper studies an infinite horizon optimal control problem for discrete-time linear system and quadratic criteria, both with random parameters which are independent and identically distributed with respect to time. In this general setting, we apply the policy gradient method, a reinforcement learning technique, to search for the optimal control without requiring knowledge of statistical information of the parameters. We investigate the sub-Gaussianity of the state process and establish global linear convergence guarantee for this approach based on assumptions that are weaker and easier to verify compared to existing results. Numerical experiments are presented to illustrate our result.
翻译:本文研究了离散时间线性系统和二次评估指标的无限时优化控制问题,两者都具有独立同分布于时间的随机参数。在这一普遍情况下,我们应用策略梯度方法,一种强化学习技术, 搜索最优控制器而不需要知道参数的统计信息。我们研究状态过程的子高斯性,并在现有结果的假设较弱且更易验证的情况下建立了全局线性收敛性保证。我们呈现了数值实验来说明我们的结果。