自然政策梯度与日志线性政策平衡的线性趋同 (Linear Convergence for Natural Policy Gradient with Log-linear Policy Parametrization) - 专知论文

会员服务 ·

0

线性的 · 特征函数 · 估计误差 · CASE · 泛函 ·

2022 年 9 月 30 日

Linear Convergence for Natural Policy Gradient with Log-linear Policy Parametrization

翻译：自然政策梯度与日志线性政策平衡的线性趋同

Carlo Alfano,Patrick Rebeschini

We analyze the convergence rate of the unregularized natural policy gradient algorithm with log-linear policy parametrizations in infinite-horizon discounted Markov decision processes. In the deterministic case, when the Q-value is known and can be approximated by a linear combination of a known feature function up to a bias error, we show that a geometrically-increasing step size yields a linear convergence rate towards an optimal policy. We then consider the sample-based case, when the best representation of the Q- value function among linear combinations of a known feature function is known up to an estimation error. In this setting, we show that the algorithm enjoys the same linear guarantees as in the deterministic case up to an error term that depends on the estimation error, the bias error, and the condition number of the feature covariance matrix. Our results build upon the general framework of policy mirror descent and extend previous findings for the softmax tabular parametrization to the log-linear policy class.

翻译：我们分析了在无限离子折扣的Markov 决策程序中非常规自然政策梯度算法与对线性政策准差值的趋同率。在确定性案例中,当Q值为已知的Q值,并且可以用已知特性函数的线性组合相近,从而得出偏差错误时,我们发现几何式增长的步进尺寸可以产生向最佳政策方向的线性趋同率。然后我们考虑了抽样案例,当已知特性函数线性组合中Q值函数的最佳表示度为估计错误时。在这个设置中,我们表明该算法享有与确定性案例相同的线性保证,直至一个错误,取决于估计错误、偏差错误和特征共变矩阵的条件号。我们的结果以政策反向下降的一般框架为基础,并将以前关于软模表表性半调的结果扩大到对线性政策类别。

0

相关内容

线性的

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

最浅显的奇异值分解(SVD)介绍，《Singular Value Decomposition as Simply as Possible》

最浅显的奇异值分解(SVD)介绍，《Singular Value Decomposition as Simply as Possible》

专知会员服务

12+阅读 · 2022年3月14日

ICLR 2021杰出论文奖出炉，8篇论文上榜！

专知会员服务

26+阅读 · 2021年4月2日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新七篇强化学习相关论文—逻辑约束、综述、多任务深度强化学习、参数服务器、事件抽取、分层强化学习、过拟合研究

【论文推荐】最新七篇强化学习相关论文—逻辑约束、综述、多任务深度强化学习、参数服务器、事件抽取、分层强化学习、过拟合研究

专知

25+阅读 · 2018年4月29日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

基于Amalgam空间的Hardy空间实变理论及其应用

国家自然科学基金

0+阅读 · 2017年12月31日

对偶Auslander转置及其诱导模类的同调性质研究

国家自然科学基金

0+阅读 · 2015年12月31日

随机时滞微分方程解的矩稳定性和有界性

国家自然科学基金

0+阅读 · 2014年12月31日

序列的几种复杂度及其关系研究

国家自然科学基金

1+阅读 · 2013年12月31日

磁流体及其相关模型的定性研究

国家自然科学基金

0+阅读 · 2013年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

低秩矩阵复原的Schatten-q(0<q<1)正则化理论与算法研究

国家自然科学基金

1+阅读 · 2012年12月31日

Witten Laplacian的特征值及与其相关的Ricci Soliton研究

国家自然科学基金

0+阅读 · 2012年12月31日

遍历哈密顿系统的谱理论

国家自然科学基金

0+阅读 · 2009年12月31日

p进表示的伽罗瓦上同调

国家自然科学基金

0+阅读 · 2008年12月31日

POLICE: Provably Optimal Linear Constraint Enforcement for Deep Neural Networks

POLICE: Provably Optimal Linear Constraint Enforcement for Deep Neural Networks

Arxiv

0+阅读 · 2022年11月7日

How to Coordinate Edge Devices for Over-the-Air Federated Learning?

Arxiv

0+阅读 · 2022年11月7日

Continuous and Discrete Data Assimilation with Noisy Observations for the Rayleigh-Benard Convection: A Computational Study

Arxiv

0+阅读 · 2022年11月5日

Higher order time discretization method for the stochastic Stokes equations with multiplicative noise

Arxiv

0+阅读 · 2022年11月4日

Bayesian Sequential Experimental Design for a Partially Linear Model with a Gaussian Process Prior

Arxiv

0+阅读 · 2022年11月4日

Oracle Inequalities for Model Selection in Offline Reinforcement Learning

Arxiv

0+阅读 · 2022年11月3日

Theta-Resonance: A Single-Step Reinforcement Learning Method for Design Space Exploration

Arxiv

0+阅读 · 2022年11月3日

Proximal Subgradient Norm Minimization of ISTA and FISTA

Arxiv

0+阅读 · 2022年11月3日

Benefits of Monotonicity in Safe Exploration with Gaussian Processes

Arxiv

0+阅读 · 2022年11月3日

Interpretable Personalization via Policy Learning with Linear Decision Boundaries

Arxiv

0+阅读 · 2022年11月2日

VIP会员

文章信息

相关主题

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

最浅显的奇异值分解(SVD)介绍，《Singular Value Decomposition as Simply as Possible》

最浅显的奇异值分解(SVD)介绍，《Singular Value Decomposition as Simply as Possible》

专知会员服务

12+阅读 · 2022年3月14日

ICLR 2021杰出论文奖出炉，8篇论文上榜！

专知会员服务

26+阅读 · 2021年4月2日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

操作系统智能体：基于多模态大模型（MLLM）的通用计算设备智能体综述

《美国太空军系统全生命周期建模、仿真与分析效能提升方案》最新84页报告

【博士论文】推进数据高效的深度学习：非参数 Transformer、主动测试与上下文学习

自主人工智能：未来战争是否将是自主化的？

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新七篇强化学习相关论文—逻辑约束、综述、多任务深度强化学习、参数服务器、事件抽取、分层强化学习、过拟合研究

【论文推荐】最新七篇强化学习相关论文—逻辑约束、综述、多任务深度强化学习、参数服务器、事件抽取、分层强化学习、过拟合研究

专知

25+阅读 · 2018年4月29日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

相关论文

POLICE: Provably Optimal Linear Constraint Enforcement for Deep Neural Networks

POLICE: Provably Optimal Linear Constraint Enforcement for Deep Neural Networks

Arxiv

0+阅读 · 2022年11月7日

How to Coordinate Edge Devices for Over-the-Air Federated Learning?

Arxiv

0+阅读 · 2022年11月7日

Continuous and Discrete Data Assimilation with Noisy Observations for the Rayleigh-Benard Convection: A Computational Study

Arxiv

0+阅读 · 2022年11月5日

Higher order time discretization method for the stochastic Stokes equations with multiplicative noise

Arxiv

0+阅读 · 2022年11月4日

Bayesian Sequential Experimental Design for a Partially Linear Model with a Gaussian Process Prior

Arxiv

0+阅读 · 2022年11月4日

Oracle Inequalities for Model Selection in Offline Reinforcement Learning

Arxiv

0+阅读 · 2022年11月3日

Theta-Resonance: A Single-Step Reinforcement Learning Method for Design Space Exploration

Arxiv

0+阅读 · 2022年11月3日

Proximal Subgradient Norm Minimization of ISTA and FISTA

Arxiv

0+阅读 · 2022年11月3日

Benefits of Monotonicity in Safe Exploration with Gaussian Processes

Arxiv

0+阅读 · 2022年11月3日

Interpretable Personalization via Policy Learning with Linear Decision Boundaries

Arxiv

0+阅读 · 2022年11月2日

相关基金

基于Amalgam空间的Hardy空间实变理论及其应用

国家自然科学基金

0+阅读 · 2017年12月31日

对偶Auslander转置及其诱导模类的同调性质研究

国家自然科学基金

0+阅读 · 2015年12月31日

随机时滞微分方程解的矩稳定性和有界性

国家自然科学基金

0+阅读 · 2014年12月31日

序列的几种复杂度及其关系研究

国家自然科学基金

1+阅读 · 2013年12月31日

磁流体及其相关模型的定性研究

国家自然科学基金

0+阅读 · 2013年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

低秩矩阵复原的Schatten-q(0<q<1)正则化理论与算法研究

国家自然科学基金

1+阅读 · 2012年12月31日

Witten Laplacian的特征值及与其相关的Ricci Soliton研究

国家自然科学基金

0+阅读 · 2012年12月31日

遍历哈密顿系统的谱理论

国家自然科学基金

0+阅读 · 2009年12月31日

p进表示的伽罗瓦上同调

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员