关于差异缩小政策梯分法的趋同和抽样效率 (On the Convergence and Sample Efficiency of Variance-Reduced Policy Gradient Method)

Policy gradient (PG) gives rise to a rich class of reinforcement learning (RL) methods. Recently, there has been an emerging trend to accelerate the existing PG methods such as REINFORCE by the \emph{variance reduction} techniques. However, all existing variance-reduced PG methods heavily rely on an uncheckable importance weight assumption made for every single iteration of the algorithms. In this paper, a simple gradient truncation mechanism is proposed to address this issue. Moreover, we design a Truncated Stochastic Incremental Variance-Reduced Policy Gradient (TSIVR-PG) method, which is able to maximize not only a cumulative sum of rewards but also a general utility function over a policy's long-term visiting distribution. We show an $\tilde{\mathcal{O}}(\epsilon^{-3})$ sample complexity for TSIVR-PG to find an $\epsilon$-stationary policy. By assuming the overparameterizaiton of policy and exploiting the hidden convexity of the problem, we further show that TSIVR-PG converges to global $\epsilon$-optimal policy with $\tilde{\mathcal{O}}(\epsilon^{-2})$ samples.

翻译：政策梯度( PG) 导致大量强化学习方法( RL ) 。最近, 出现了通过 emph{ varience reduction} 技术加速现有 PG 方法( 如 REINFORCE ) 的趋势。然而, 所有现有的差异降级 PG 方法都严重依赖对算法每次迭代的不可核实的重量假设。在此文件中, 提议了一个简单的梯度脱轨机制来解决这个问题。此外, 我们设计了一种快速的存储式递增差异化政策梯度( TSIVR- PG) 方法, 这种方法不仅能够使奖励的累积总和最大化, 而且在政策长期访问分布方面有一个通用功能。我们展示了 TIVR- PG的样本复杂性, 以便找到一个 $\ epslon$- stative 政策。我们假设政策的过度匹配性, 并且利用隐藏的共产值, 我们进一步展示了 $TS- 2\\\\ pGPGS- salon $。

相关内容

关注 0

Pacific Graphics是亚洲图形协会的旗舰会议。作为一个非常成功的会议系列，太平洋图形公司为太平洋沿岸以及世界各地的研究人员，开发人员，从业人员提供了一个高级论坛，以介绍和讨论计算机图形学及相关领域的新问题，解决方案和技术。太平洋图形会议的目的是召集来自各个领域的研究人员，以展示他们的最新成果，开展合作并为研究领域的发展做出贡献。会议将包括定期的论文讨论会，进行中的讨论会，教程以及由与计算机图形学和交互系统相关的所有领域的国际知名演讲者的演讲。官网地址：http://dblp.uni-trier.de/db/conf/pg/index.html

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

策略梯度方法的算子视图，An operator view of policy gradient methods

专知会员服务

11+阅读 · 2020年6月23日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

112+阅读 · 2020年5月15日

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

专知会员服务

84+阅读 · 2020年2月18日