改进对政策梯度和自然政策梯度方法(变化-减少)的分析 (An Improved Analysis of (Variance-Reduced) Policy Gradient and Natural Policy Gradient Methods)

In this paper, we revisit and improve the convergence of policy gradient (PG), natural PG (NPG) methods, and their variance-reduced variants, under general smooth policy parametrizations. More specifically, with the Fisher information matrix of the policy being positive definite: i) we show that a state-of-the-art variance-reduced PG method, which has only been shown to converge to stationary points, converges to the globally optimal value up to some inherent function approximation error due to policy parametrization; ii) we show that NPG enjoys a lower sample complexity; iii) we propose SRVR-NPG, which incorporates variance-reduction into the NPG update. Our improvements follow from an observation that the convergence of (variance-reduced) PG and NPG methods can improve each other: the stationary convergence analysis of PG can be applied to NPG as well, and the global convergence analysis of NPG can help to establish the global convergence of (variance-reduced) PG methods. Our analysis carefully integrates the advantages of these two lines of works. Thanks to this improvement, we have also made variance-reduction for NPG possible, with both global convergence and an efficient finite-sample complexity.

翻译：在本文中,我们重新审视并改进政策梯度、自然PG(NPG)方法及其差异减少变异的趋同性,采用一般平稳的政策平衡法。更具体地说,政策中的渔业信息矩阵是肯定的:(一) 我们表明,最新的最新(变异)差异减少PG方法仅显示会与固定点相趋同,与全球最佳值相趋同,但因政策不对称而导致某些内在功能近似误差;(二) 我们表明,NPG的抽样复杂性较低;三) 我们建议SRVR-NPG, 将差异减少纳入NPG的更新。我们之所以作出改进是因为观察到(变异)PG和NPG方法的趋同性能够相互改进:PG的固定趋同性分析也可适用于NPG, 而NPG方法的全球趋同性分析有助于建立全球(变异)方法的全球趋同性趋同性;我们的分析仔细结合了这些工程的优势,使(变异性)和变异性趋同性都有利于全球的变异性。

相关内容

关注 0

Pacific Graphics是亚洲图形协会的旗舰会议。作为一个非常成功的会议系列，太平洋图形公司为太平洋沿岸以及世界各地的研究人员，开发人员，从业人员提供了一个高级论坛，以介绍和讨论计算机图形学及相关领域的新问题，解决方案和技术。太平洋图形会议的目的是召集来自各个领域的研究人员，以展示他们的最新成果，开展合作并为研究领域的发展做出贡献。会议将包括定期的论文讨论会，进行中的讨论会，教程以及由与计算机图形学和交互系统相关的所有领域的国际知名演讲者的演讲。官网地址：http://dblp.uni-trier.de/db/conf/pg/index.html