政策优化与肉皮镜源 (Policy Optimization with Stochastic Mirror Descent) - 专知论文

会员服务 ·

0

Extensibility · 样本 · 优化器 · 样本复杂度 · 驻点 ·

2021 年 12 月 3 日

Policy Optimization with Stochastic Mirror Descent

翻译：政策优化与肉皮镜源

Long Yang,Yu Zhang,Gang Zheng,Qian Zheng,Pengfei Li,Jun Wen,Gang Pan

Improving sample efficiency has been a longstanding goal in reinforcement learning. This paper proposes $\mathtt{VRMPO}$ algorithm: a sample efficient policy gradient method with stochastic mirror descent. In $\mathtt{VRMPO}$, a novel variance-reduced policy gradient estimator is presented to improve sample efficiency. We prove that the proposed $\mathtt{VRMPO}$ needs only $\mathcal{O}(\epsilon^{-3})$ sample trajectories to achieve an $\epsilon$-approximate first-order stationary point, which matches the best sample complexity for policy optimization. The extensive experimental results demonstrate that $\mathtt{VRMPO}$ outperforms the state-of-the-art policy gradient methods in various settings.

翻译：提高抽样效率一直是加强学习的一个长期目标。本文提出了 $\ matht{ VRMPO} 算法 : 一种具有随机镜像底部的抽样有效政策梯度方法。在 $\ matht{ VRMPO} $ 中, 提出了一个新的差异变换政策梯度估计值, 以提高抽样效率。我们证明, $\ matht{ VRMPO} 的建议只需要 $\ mathcal{ O} (\ psilon}-3} ) 美元样本轨迹来实现 $\ explon$- 近似第一阶固定点, 与政策优化的最佳样本复杂性相匹配。广泛的实验结果表明, $\ matht{ VRMPO} 显示, $matht{ VRMPO} 美元在各种环境下都比最先进的政策梯度方法要好。

0

相关内容

Extensibility

iOS 8 提供的应用间和应用跟系统的功能交互特性。

Today (iOS and OS X): widgets for the Today view of Notification Center
Share (iOS and OS X): post content to web services or share content with others
Actions (iOS and OS X): app extensions to view or manipulate inside another app
Photo Editing (iOS): edit a photo or video in Apple's Photos app with extensions from a third-party apps
Finder Sync (OS X): remote file storage in the Finder with support for Finder content annotation
Storage Provider (iOS): an interface between files inside an app and other apps on a user's device
Custom Keyboard (iOS): system-wide alternative keyboards

Source: iOS 8 Extensions: Apple’s Plan for a Powerful App Ecosystem

【Google】梯度下降，48页ppt

【Google】梯度下降，48页ppt

专知会员服务

81+阅读 · 2020年12月5日

【经典书】应用随机微分方程，324页pdf，Applied Stochastic Differential Equations

【经典书】应用随机微分方程，324页pdf，Applied Stochastic Differential Equations

专知会员服务

60+阅读 · 2020年11月21日

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

专知会员服务

19+阅读 · 2020年6月29日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

OpenAI丨深度强化学习关键论文列表

OpenAI丨深度强化学习关键论文列表

中国人工智能学会

17+阅读 · 2018年11月10日

【OpenAI】深度强化学习关键论文列表

【OpenAI】深度强化学习关键论文列表

专知

12+阅读 · 2018年11月10日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

Policy Optimization for Stochastic Shortest Path

Arxiv

0+阅读 · 2022年2月7日

A Dimension-Insensitive Algorithm for Stochastic Zeroth-Order Optimization

Arxiv

0+阅读 · 2022年2月7日

Convergence of a robust deep FBSDE method for stochastic control

Arxiv

0+阅读 · 2022年2月6日

Policy Optimization for Constrained MDPs with Provable Fast Global Convergence

Arxiv

0+阅读 · 2022年2月3日

Escape saddle points by a simple gradient-descent based algorithm

Arxiv

4+阅读 · 2021年11月28日

VIP会员

文章信息

相关主题

样本复杂度

相关VIP内容

【Google】梯度下降，48页ppt

【Google】梯度下降，48页ppt

专知会员服务

81+阅读 · 2020年12月5日

【经典书】应用随机微分方程，324页pdf，Applied Stochastic Differential Equations

【经典书】应用随机微分方程，324页pdf，Applied Stochastic Differential Equations

专知会员服务

60+阅读 · 2020年11月21日

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

专知会员服务

19+阅读 · 2020年6月29日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

《基于AI的动态任务分配策略实现多智能体系统有意义人类控制》报告

《超越连接：AI驱动网络未来愿景》最新报告

人工智能赋能多域作战：能力与挑战

《战场空间决策优势：AI基础与应用研究》总结报告

相关资讯

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

OpenAI丨深度强化学习关键论文列表

OpenAI丨深度强化学习关键论文列表

中国人工智能学会

17+阅读 · 2018年11月10日

【OpenAI】深度强化学习关键论文列表

【OpenAI】深度强化学习关键论文列表

专知

12+阅读 · 2018年11月10日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Policy Optimization for Stochastic Shortest Path

Arxiv

0+阅读 · 2022年2月7日

A Dimension-Insensitive Algorithm for Stochastic Zeroth-Order Optimization

Arxiv

0+阅读 · 2022年2月7日

Convergence of a robust deep FBSDE method for stochastic control

Arxiv

0+阅读 · 2022年2月6日

Policy Optimization for Constrained MDPs with Provable Fast Global Convergence

Arxiv

0+阅读 · 2022年2月3日

Escape saddle points by a simple gradient-descent based algorithm

Arxiv

4+阅读 · 2021年11月28日

微信扫码咨询专知VIP会员