Owing to the growth of interest in Reinforcement Learning in the last few years, gradient based policy control methods have been gaining popularity for Control problems as well. And rightly so, since gradient policy methods have the advantage of optimizing a metric of interest in an end-to-end manner, along with being relatively easy to implement without complete knowledge of the underlying system. In this paper, we study the global convergence of gradient-based policy optimization methods for quadratic control of discrete-time and model-free Markovian jump linear systems (MJLS). We surmount myriad challenges that arise because of more than one states coupled with lack of knowledge of the system dynamics and show global convergence of the policy using gradient descent and natural policy gradient methods. We also provide simulation studies to corroborate our claims.
翻译:由于过去几年对加强学习的兴趣增加,基于梯度的政策控制方法也越来越受到控制问题的欢迎。 正确,因为梯度政策方法具有以端到端方式优化利益度量的优势,同时在不完全了解基本系统的情况下相对容易实施。 在本文中,我们研究了基于梯度的政策优化方法的全球趋同情况,以对离散时间和无模型的马尔科维安跳线系统进行二次控制。我们克服了因一个以上国家缺乏对系统动态的了解而出现的无数挑战,同时缺乏对系统动态的了解,并显示了使用梯度下降和自然政策梯度方法的政策的全球趋同情况。我们还提供了模拟研究,以证实我们的主张。