Entropy-Regul化自然政策梯度与线性函数接近度的线性趋同 (Linear Convergence of Entropy-Regularized Natural Policy Gradient with Linear Function Approximation) - 专知论文

会员服务 ·

0

正则化项 · 线性的 · 近似误差 · 近似 · 泛函 ·

2022 年 2 月 17 日

Linear Convergence of Entropy-Regularized Natural Policy Gradient with Linear Function Approximation

翻译：Entropy-Regul化自然政策梯度与线性函数接近度的线性趋同

Semih Cayci,Niao He,R. Srikant

Natural policy gradient (NPG) methods with entropy regularization achieve impressive empirical success in reinforcement learning problems with large state-action spaces. However, their convergence properties and the impact of entropy regularization remain elusive in the function approximation regime. In this paper, we establish finite-time convergence analyses of entropy-regularized NPG with linear function approximation under softmax parameterization. In particular, we prove that entropy-regularized NPG with averaging satisfies the \emph{persistence of excitation} condition, and achieves a fast convergence rate of $\tilde{O}(1/T)$ up to a function approximation error in regularized Markov decision processes. This convergence result does not require any a priori assumptions on the policies. Furthermore, under mild regularity conditions on the concentrability coefficient and basis vectors, we prove that entropy-regularized NPG exhibits \emph{linear convergence} up to a function approximation error.

翻译：自然政策梯度( NPG) 方法与 entropy 正规化, 取得了惊人的成功。但是, 在功能近似化制度中, 它们的趋同特性和对 entropy 正规化的影响仍然难以找到。在本文中, 我们建立了对 entropy 正规化 NPG 的有限时间趋同分析, 其线性函数近似值在软负负参数化之下。特别是, 我们证明 enropy 正规化 NPG 平均满足 exucation} 条件, 并达到 $\ tde{O}( 1/ T) 的快速趋同率, 直至正规化的 Markov 决策程序中的函数近似差。这一趋同结果并不要求政策上的任何先验假设。此外, 在温和系数和基矢量的温性常态条件下, 我们证明普通的NPGPG 显示 empropic 显示 emph{ 线性趋同至函数近差。

0

相关内容

正则化项

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

专知会员服务

53+阅读 · 2021年1月20日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【新书】人工智能Python代码，227页pdf，Python code for Artificial Intelligence: Foundations of Computational Agents

【新书】人工智能Python代码，227页pdf，Python code for Artificial Intelligence: Foundations of Computational Agents

专知会员服务

102+阅读 · 2020年6月21日

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

专知会员服务

77+阅读 · 2020年2月8日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

有向图的彩虹连通问题的研究

国家自然科学基金

0+阅读 · 2015年12月31日

套子代数的Hochschild上同调及套的分类

国家自然科学基金

3+阅读 · 2014年12月31日

概率图模型学习及其在数据分析中的应用研究

国家自然科学基金

16+阅读 · 2013年12月31日

稀疏张量学习理论

国家自然科学基金

1+阅读 · 2012年12月31日

图的有限定条件的圈问题研究

国家自然科学基金

0+阅读 · 2012年12月31日

图的若干参数及算法研究

国家自然科学基金

0+阅读 · 2011年12月31日

强八元数矩阵代数与矢量传感器阵列多维信号处理

国家自然科学基金

0+阅读 · 2011年12月31日

基于梯度Kriging方法的轮胎花纹形状优化

国家自然科学基金

0+阅读 · 2011年12月31日

布尔函数的密码性质研究

国家自然科学基金

0+阅读 · 2011年12月31日

复杂网络拓扑结构抗毁性的谱测度研究

国家自然科学基金

0+阅读 · 2009年12月31日

A Two-Time-Scale Stochastic Optimization Framework with Applications in Control and Reinforcement Learning

A Two-Time-Scale Stochastic Optimization Framework with Applications in Control and Reinforcement Learning

Arxiv

0+阅读 · 2022年4月20日

COptiDICE: Offline Constrained Reinforcement Learning via Stationary Distribution Correction Estimation

Arxiv

0+阅读 · 2022年4月19日

Stochastic Saddle Point Problems with Decision-Dependent Distributions

Arxiv

0+阅读 · 2022年4月19日

Stochastic Gradient Descent on Separable Data: Exact Convergence with a Fixed Learning Rate

Stochastic Gradient Descent on Separable Data: Exact Convergence with a Fixed Learning Rate

Arxiv

1+阅读 · 2022年4月18日

A Convergence Analysis of Nesterov's Accelerated Gradient Method in Training Deep Linear Neural Networks

Arxiv

0+阅读 · 2022年4月18日

M-Estimation based on quasi-processes from discrete samples of Levy processes

Arxiv

0+阅读 · 2022年4月18日

Mixed Isogeometric Discretizations for Planar Linear Elasticity

Arxiv

0+阅读 · 2022年4月17日

Singular quadratic eigenvalue problems: Linearization and weak condition numbers

Arxiv

0+阅读 · 2022年4月15日

Convergence and Implicit Regularization Properties of Gradient Descent for Deep Residual Networks

Arxiv

0+阅读 · 2022年4月14日

Information in probability: Another information-theoretic proof of a finite de Finetti theorem

Arxiv

0+阅读 · 2022年4月14日

VIP会员

文章信息

相关主题

相关VIP内容

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

专知会员服务

53+阅读 · 2021年1月20日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【新书】人工智能Python代码，227页pdf，Python code for Artificial Intelligence: Foundations of Computational Agents

【新书】人工智能Python代码，227页pdf，Python code for Artificial Intelligence: Foundations of Computational Agents

专知会员服务

102+阅读 · 2020年6月21日

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

专知会员服务

77+阅读 · 2020年2月8日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【CMU博士论文】数据驱动决策中的激励、信息与不确定性

DGP双粒度提示框架：图增强大模型助力欺诈检测

【ICCV2025】ESSENTIAL：用于视频类增量学习的情景记忆与语义记忆整合

唯快不破：大型语言模型高效架构综述

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

A Two-Time-Scale Stochastic Optimization Framework with Applications in Control and Reinforcement Learning

A Two-Time-Scale Stochastic Optimization Framework with Applications in Control and Reinforcement Learning

Arxiv

0+阅读 · 2022年4月20日

COptiDICE: Offline Constrained Reinforcement Learning via Stationary Distribution Correction Estimation

Arxiv

0+阅读 · 2022年4月19日

Stochastic Saddle Point Problems with Decision-Dependent Distributions

Arxiv

0+阅读 · 2022年4月19日

Stochastic Gradient Descent on Separable Data: Exact Convergence with a Fixed Learning Rate

Stochastic Gradient Descent on Separable Data: Exact Convergence with a Fixed Learning Rate

Arxiv

1+阅读 · 2022年4月18日

A Convergence Analysis of Nesterov's Accelerated Gradient Method in Training Deep Linear Neural Networks

Arxiv

0+阅读 · 2022年4月18日

M-Estimation based on quasi-processes from discrete samples of Levy processes

Arxiv

0+阅读 · 2022年4月18日

Mixed Isogeometric Discretizations for Planar Linear Elasticity

Arxiv

0+阅读 · 2022年4月17日

Singular quadratic eigenvalue problems: Linearization and weak condition numbers

Arxiv

0+阅读 · 2022年4月15日

Convergence and Implicit Regularization Properties of Gradient Descent for Deep Residual Networks

Arxiv

0+阅读 · 2022年4月14日

Information in probability: Another information-theoretic proof of a finite de Finetti theorem

Arxiv

0+阅读 · 2022年4月14日

相关基金

有向图的彩虹连通问题的研究

国家自然科学基金

0+阅读 · 2015年12月31日

套子代数的Hochschild上同调及套的分类

国家自然科学基金

3+阅读 · 2014年12月31日

概率图模型学习及其在数据分析中的应用研究

国家自然科学基金

16+阅读 · 2013年12月31日

稀疏张量学习理论

国家自然科学基金

1+阅读 · 2012年12月31日

图的有限定条件的圈问题研究

国家自然科学基金

0+阅读 · 2012年12月31日

图的若干参数及算法研究

国家自然科学基金

0+阅读 · 2011年12月31日

强八元数矩阵代数与矢量传感器阵列多维信号处理

国家自然科学基金

0+阅读 · 2011年12月31日

基于梯度Kriging方法的轮胎花纹形状优化

国家自然科学基金

0+阅读 · 2011年12月31日

布尔函数的密码性质研究

国家自然科学基金

0+阅读 · 2011年12月31日

复杂网络拓扑结构抗毁性的谱测度研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员