准牛顿政策梯度算法 (Quasi-Newton policy gradient algorithms) - 专知论文

会员服务 ·

0

拟牛顿法 · 香农熵 · 正则化项 · state-of-the-art · 香农 ·

2022 年 4 月 22 日

Quasi-Newton policy gradient algorithms

翻译：准牛顿政策梯度算法

Haoya Li,Samarth Gupta,Hsiangfu Yu,Lexing Ying,Inderjit Dhillon

from arxiv, 18 pages, 10 figures

Policy gradient algorithms have been widely applied to Markov decision process and reinforcement learning problems in recent years. Regularization with various entropy functions is often used to encourage exploration and improve stability. In this paper, we propose a quasi-Newton method for the policy gradient algorithm with entropy regularization. In the case of Shannon entropy, the resulting algorithm reproduces the natural policy gradient algorithm. For other entropy functions, this method results in brand new policy gradient algorithms. We provide a simple proof that all these algorithms enjoy the Newton-type quadratic convergence and that the corresponding gradient flow converges globally to the optimal solution. Using both synthetic and industrial-scale examples, we demonstrate that the proposed quasi-Newton method typically converges in single-digit iterations, often orders of magnitude faster than other state-of-the-art algorithms.

翻译：近年来,政策梯度算法被广泛应用于Markov决策过程和强化学习问题。使用各种英特罗比函数的正规化常常被用来鼓励探索和增强稳定性。在本文中,我们提出了使用英特罗比正规化的政策梯度算法的准牛顿方法。在香农英特罗比的情况下,由此产生的算法复制了自然政策梯度算法。对于其他英特罗比函数,这一方法产生了新的新政策梯度算法。我们提供了一个简单的证据,证明所有这些算法都享受牛顿型四面形趋同,相应的梯度流也遍及全球,达到最佳的解决方案。我们用合成和工业规模的例子来证明,拟议的准牛顿方法通常会集中在单位数的迭代法中,其数量往往比其他最先进的算法更快。

0

相关内容

拟牛顿法

拟牛顿法(Quasi-Newton Methods)是求解非线性优化问题最有效的方法之一，于20世纪50年代由美国Argonne国家实验室的物理学家W. C. Davidon所提出来。Davidon设计的这种算法在当时看来是非线性优化领域最具创造性的发明之一。不久R. Fletcher和M. J. D. Powell证实了这种新的算法远比其他方法快速和可靠，使得非线性优化这门学科在一夜之间突飞猛进。

【2022新书】强化学习工业应用，408页pdf

【2022新书】强化学习工业应用，408页pdf

专知会员服务

231+阅读 · 2022年2月3日

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

开源书：PyTorch深度学习起步

开源书：PyTorch深度学习起步

专知会员服务

51+阅读 · 2019年10月11日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

【ICIG2021】Latest News & Announcements of the Industry Talk2

【ICIG2021】Latest News & Announcements of the Industry Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年7月29日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

Hamilton-Jacibi方程的弱KAM理论

国家自然科学基金

2+阅读 · 2017年12月31日

粘性不可压缩流体最优控制问题的数值分析

国家自然科学基金

0+阅读 · 2015年12月31日

基于信标的死锁控制与性能分析

国家自然科学基金

0+阅读 · 2013年12月31日

粗糙核奇异积分算子的若干问题研究

国家自然科学基金

0+阅读 · 2013年12月31日

木质素还原浸出氧化锰矿的反应特性与微波强化

国家自然科学基金

0+阅读 · 2012年12月31日

生物还原耦合化学吸收处理烟气中NOx的过程强化与调控机制

国家自然科学基金

0+阅读 · 2012年12月31日

稠密等离子体电子结构的研究

国家自然科学基金

0+阅读 · 2012年12月31日

多复变全纯函数空间及其空间上的复合算子

国家自然科学基金

0+阅读 · 2011年12月31日

阿部鲻鰕虎生物标记物对河口POPs污染的预警

国家自然科学基金

0+阅读 · 2009年12月31日

微生物降解多环芳烃的代谢物分析及其共代谢机理

国家自然科学基金

0+阅读 · 2009年12月31日

Projected State-action Balancing Weights for Offline Reinforcement Learning

Arxiv

0+阅读 · 2022年6月9日

Regret Analysis of Certainty Equivalence Policies in Continuous-Time Linear-Quadratic Systems

Regret Analysis of Certainty Equivalence Policies in Continuous-Time Linear-Quadratic Systems

Arxiv

0+阅读 · 2022年6月9日

How unfair is private learning ?

How unfair is private learning ?

Arxiv

0+阅读 · 2022年6月8日

Designing Reinforcement Learning Algorithms for Digital Interventions: Pre-implementation Guidelines

Arxiv

0+阅读 · 2022年6月8日

Boosting the Confidence of Generalization for $L_2$-Stable Randomized Learning Algorithms

Arxiv

0+阅读 · 2022年6月8日

Reachability Constrained Reinforcement Learning

Arxiv

0+阅读 · 2022年6月7日

A semi-conjugate gradient method for solving unsymmetric positive definite linear systems

Arxiv

0+阅读 · 2022年6月7日

Information-theoretic generalization bounds for black-box learning algorithms

Arxiv

12+阅读 · 2021年10月4日

Optimization for deep learning: theory and algorithms

Optimization for deep learning: theory and algorithms

Arxiv

106+阅读 · 2019年12月19日

Learning Discrete Structures for Graph Neural Networks

Arxiv

17+阅读 · 2019年3月28日

VIP会员

文章信息

相关主题

state-of-the-art

相关VIP内容

【2022新书】强化学习工业应用，408页pdf

【2022新书】强化学习工业应用，408页pdf

专知会员服务

231+阅读 · 2022年2月3日

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

开源书：PyTorch深度学习起步

开源书：PyTorch深度学习起步

专知会员服务

51+阅读 · 2019年10月11日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《战区安全决策课程体系》最新244页

《"无人机航母"原型平台》

任务规划与地形分析：现代复杂环境作战导航体系

《攻击场景描述形式化模型研究》

相关资讯

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

【ICIG2021】Latest News & Announcements of the Industry Talk2

【ICIG2021】Latest News & Announcements of the Industry Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年7月29日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

相关论文

Projected State-action Balancing Weights for Offline Reinforcement Learning

Arxiv

0+阅读 · 2022年6月9日

Regret Analysis of Certainty Equivalence Policies in Continuous-Time Linear-Quadratic Systems

Regret Analysis of Certainty Equivalence Policies in Continuous-Time Linear-Quadratic Systems

Arxiv

0+阅读 · 2022年6月9日

How unfair is private learning ?

How unfair is private learning ?

Arxiv

0+阅读 · 2022年6月8日

Designing Reinforcement Learning Algorithms for Digital Interventions: Pre-implementation Guidelines

Arxiv

0+阅读 · 2022年6月8日

Boosting the Confidence of Generalization for $L_2$-Stable Randomized Learning Algorithms

Arxiv

0+阅读 · 2022年6月8日

Reachability Constrained Reinforcement Learning

Arxiv

0+阅读 · 2022年6月7日

A semi-conjugate gradient method for solving unsymmetric positive definite linear systems

Arxiv

0+阅读 · 2022年6月7日

Information-theoretic generalization bounds for black-box learning algorithms

Arxiv

12+阅读 · 2021年10月4日

Optimization for deep learning: theory and algorithms

Optimization for deep learning: theory and algorithms

Arxiv

106+阅读 · 2019年12月19日

Learning Discrete Structures for Graph Neural Networks

Arxiv

17+阅读 · 2019年3月28日

相关基金

Hamilton-Jacibi方程的弱KAM理论

国家自然科学基金

2+阅读 · 2017年12月31日

粘性不可压缩流体最优控制问题的数值分析

国家自然科学基金

0+阅读 · 2015年12月31日

基于信标的死锁控制与性能分析

国家自然科学基金

0+阅读 · 2013年12月31日

粗糙核奇异积分算子的若干问题研究

国家自然科学基金

0+阅读 · 2013年12月31日

木质素还原浸出氧化锰矿的反应特性与微波强化

国家自然科学基金

0+阅读 · 2012年12月31日

生物还原耦合化学吸收处理烟气中NOx的过程强化与调控机制

国家自然科学基金

0+阅读 · 2012年12月31日

稠密等离子体电子结构的研究

国家自然科学基金

0+阅读 · 2012年12月31日

多复变全纯函数空间及其空间上的复合算子

国家自然科学基金

0+阅读 · 2011年12月31日

阿部鲻鰕虎生物标记物对河口POPs污染的预警

国家自然科学基金

0+阅读 · 2009年12月31日

微生物降解多环芳烃的代谢物分析及其共代谢机理

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员