Entropy-Regul化自然政策梯度与线性函数接近度的线性趋同 (Linear Convergence of Entropy-Regularized Natural Policy Gradient with Linear Function Approximation) - 专知论文

会员服务 ·

0

线性的 · 近似 · 正则化项 · 泛函 · Performer ·

2021 年 8 月 11 日

Linear Convergence of Entropy-Regularized Natural Policy Gradient with Linear Function Approximation

翻译：Entropy-Regul化自然政策梯度与线性函数接近度的线性趋同

Semih Cayci,Niao He,R. Srikant

Natural policy gradient (NPG) methods with function approximation achieve impressive empirical success in reinforcement learning problems with large state-action spaces. However, theoretical understanding of their convergence behaviors remains limited in the function approximation setting. In this paper, we perform a finite-time analysis of NPG with linear function approximation and softmax parameterization, and prove for the first time that widely used entropy regularization method, which encourages exploration, leads to linear convergence rate. Under considerably weaker regularity conditions, we prove that entropy-regularized Q-NPG variant with linear function approximation achieves $\tilde{O}(1/T)$ convergence rate. We adopt a Lyapunov drift analysis to prove the convergence results and explain the effectiveness of entropy regularization in improving the convergence rates.

翻译：具有功能近似值的自然政策梯度(NPG)方法在利用大型国家行动空间强化学习问题方面取得了令人印象深刻的成功经验。然而,在功能近似设置方面,对其趋同行为的理论理解仍然有限。在本文中,我们对NPG进行有线性函数近似值和软负负参数化的有限时间分析,并首次证明广泛使用的鼓励探索的英特罗比规范化方法导致线性趋同率。在相当弱的常规性条件下,我们证明具有线性函数近似值的英特罗比常规Q-NPG变方达到了 $\ tde{O}(1/T)$的趋同率。我们采用了Lyapunov 漂移分析,以证明趋同结果并解释英特罗比规范化在改善趋同率方面的有效性。

0

相关内容

线性的

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

53+阅读 · 2020年9月7日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

80+阅读 · 2020年7月26日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【Google DeepMind & 斯坦福 AAAI2020】Options of Interest Temporal Abstraction with Interest Function

【Google DeepMind & 斯坦福 AAAI2020】Options of Interest Temporal Abstraction with Interest Function

专知会员服务

5+阅读 · 2020年1月5日

【ICCV 2019 Toturial】Global Optimization for Geometric Understanding with Provable Guarantees（具有可证明保证的几何理解的全局优化）

【ICCV 2019 Toturial】Global Optimization for Geometric Understanding with Provable Guarantees（具有可证明保证的几何理解的全局优化）

专知会员服务

18+阅读 · 2019年11月1日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

244+阅读 · 2019年10月21日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

逆强化学习几篇论文笔记

逆强化学习几篇论文笔记

CreateAMind

9+阅读 · 2018年12月13日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

Adaptive Temporal Difference Learning with Linear Function Approximation

Arxiv

0+阅读 · 2021年10月11日

Fast Policy Extragradient Methods for Competitive Games with Entropy Regularization

Arxiv

0+阅读 · 2021年10月11日

A Control-Theoretic Perspective on Optimal High-Order Optimization

Arxiv

0+阅读 · 2021年10月9日

Gradient Flows and Nonlinear Power Methods for the Computation of Nonlinear Eigenfunctions

Arxiv

0+阅读 · 2021年10月9日

Efficient Local Planning with Linear Function Approximation

Arxiv

0+阅读 · 2021年10月7日

On the Sample Complexity of Actor-Critic Method for Reinforcement Learning with Function Approximation

Arxiv

0+阅读 · 2021年10月7日

Reintroducing Straight-Through Estimators as Principled Methods for Stochastic Binary Networks

Arxiv

0+阅读 · 2021年10月7日

A Modern Introduction to Online Learning

A Modern Introduction to Online Learning

Arxiv

21+阅读 · 2019年12月31日

Variational Bayesian Reinforcement Learning with Regret Bounds

Arxiv

3+阅读 · 2018年7月25日

Variance-based regularization with convex objectives

Arxiv

5+阅读 · 2017年12月14日

VIP会员

文章信息

相关主题

相关VIP内容

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

53+阅读 · 2020年9月7日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

80+阅读 · 2020年7月26日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【Google DeepMind & 斯坦福 AAAI2020】Options of Interest Temporal Abstraction with Interest Function

【Google DeepMind & 斯坦福 AAAI2020】Options of Interest Temporal Abstraction with Interest Function

专知会员服务

5+阅读 · 2020年1月5日

【ICCV 2019 Toturial】Global Optimization for Geometric Understanding with Provable Guarantees（具有可证明保证的几何理解的全局优化）

【ICCV 2019 Toturial】Global Optimization for Geometric Understanding with Provable Guarantees（具有可证明保证的几何理解的全局优化）

专知会员服务

18+阅读 · 2019年11月1日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

244+阅读 · 2019年10月21日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

《生成式人工智能与大/小语言模型在供应链管理决策优化与可持续性提升中的作用评估》最新51页

白宫发布《赢得AI竞赛：美国人工智能行动计划》最新28页

地下战：地下空间的战略博弈

《美地下作战条令手册》228页

相关资讯

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

逆强化学习几篇论文笔记

逆强化学习几篇论文笔记

CreateAMind

9+阅读 · 2018年12月13日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

Adaptive Temporal Difference Learning with Linear Function Approximation

Arxiv

0+阅读 · 2021年10月11日

Fast Policy Extragradient Methods for Competitive Games with Entropy Regularization

Arxiv

0+阅读 · 2021年10月11日

A Control-Theoretic Perspective on Optimal High-Order Optimization

Arxiv

0+阅读 · 2021年10月9日

Gradient Flows and Nonlinear Power Methods for the Computation of Nonlinear Eigenfunctions

Arxiv

0+阅读 · 2021年10月9日

Efficient Local Planning with Linear Function Approximation

Arxiv

0+阅读 · 2021年10月7日

On the Sample Complexity of Actor-Critic Method for Reinforcement Learning with Function Approximation

Arxiv

0+阅读 · 2021年10月7日

Reintroducing Straight-Through Estimators as Principled Methods for Stochastic Binary Networks

Arxiv

0+阅读 · 2021年10月7日

A Modern Introduction to Online Learning

A Modern Introduction to Online Learning

Arxiv

21+阅读 · 2019年12月31日

Variational Bayesian Reinforcement Learning with Regret Bounds

Arxiv

3+阅读 · 2018年7月25日

Variance-based regularization with convex objectives

Arxiv

5+阅读 · 2017年12月14日

微信扫码咨询专知VIP会员