单式政策源:政策趋同、隐性规范化和改进样本复杂性 (Homotopic Policy Mirror Descent: Policy Convergence, Implicit Regularization, and Improved Sample Complexity) - 专知论文

会员服务 ·

0

样本复杂度 · 优化器 · 正则化 · 样本 · 价值函数 ·

2022 年 1 月 30 日

Homotopic Policy Mirror Descent: Policy Convergence, Implicit Regularization, and Improved Sample Complexity

翻译：单式政策源:政策趋同、隐性规范化和改进样本复杂性

Yan Li,Tuo Zhao,Guanghui Lan

We propose the homotopic policy mirror descent (HPMD) method for solving discounted, infinite horizon MDPs with finite state and action space, and study its policy convergence. We report three properties that seem to be new in the literature of policy gradient methods: (1) HPMD exhibits global linear convergence of the value optimality gap, and local superlinear convergence of the policy to the set of optimal policies with order $\gamma^{-2}$. The superlinear convergence of the policy takes effect after no more than $\mathcal{O}(\log(1/\Delta^*))$ number of iterations, where $\Delta^*$ is defined via a gap quantity associated with the optimal state-action value function; (2) HPMD also exhibits last-iterate convergence of the policy, with the limiting policy corresponding exactly to the optimal policy with the maximal entropy for every state. No regularization is added to the optimization objective and hence the second observation arises solely as an algorithmic property of the homotopic policy gradient method. (3) For the stochastic HPMD method, we further demonstrate a better than $\mathcal{O}(|\mathcal{S}| |\mathcal{A}| / \epsilon^2)$ sample complexity for small optimality gap $\epsilon$, when assuming a generative model for policy evaluation.

翻译：我们建议采用同质政策镜底法(HPMD)解决具有有限状态和行动空间的折扣、无限地平地 MDP(HPMD)方法,并研究其政策趋同。我们报告在政策梯度方法文献中似乎有三种新的属性:(1) HPMD展示了价值最佳差距的全球线性趋同,以及该政策与一套最佳政策在当地的超线性趋同,以$\gamma ⁇ ⁇ 2美元为主。该政策的超级线性趋同在不超过$\mathcal{O}(\log(1/\Delta ⁇ ))美元(美元)的迭代数之后生效,其中,$\Delta ⁇ $($Delta ⁇ )在与最佳状态行动价值功能相关的数量上似乎具有新意;(2) HPMDMD还展示了该政策的最后一率趋同,该政策与最佳政策完全对应,以$gamma{____________BAR_O}最佳政策。没有在优化目标目标上,因此仅作为同质政策梯度梯度梯度梯度梯度方法的算属性属性属性属性属性属性属性属性属性属性属性属性。(3)。我们进一步展示了美元的精度政策,因此,在假设的精度上,当______________________________________________________________________________________________________________________________________________________________________________________________________________________________

0

相关内容

样本复杂度

样本复杂度

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

53+阅读 · 2020年9月7日

最新《非光滑优化》十讲硬核课程，剑桥大学梁经纬博士主讲

最新《非光滑优化》十讲硬核课程，剑桥大学梁经纬博士主讲

专知会员服务

33+阅读 · 2020年8月14日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

中国图象图形学学会CSIG

1+阅读 · 2021年11月11日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

分数阶微分-代数方程的高精度数值算法

国家自然科学基金

0+阅读 · 2014年12月31日

动态Gr？bner 基与GVW算法

国家自然科学基金

0+阅读 · 2014年12月31日

水平油水两相流超声/电学阵列传感器融合测试方法

国家自然科学基金

1+阅读 · 2014年12月31日

连续时间马氏决策过程均值-方差优化问题的研究

国家自然科学基金

0+阅读 · 2012年12月31日

低秩矩阵复原的Schatten-q(0<q<1)正则化理论与算法研究

国家自然科学基金

1+阅读 · 2012年12月31日

PTTG1基因在大肠癌中的表达及其临床意义

国家自然科学基金

0+阅读 · 2012年12月31日

调肝降脂方对CYP7A1信号通路调控作用机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

删除CD4+CD25+Treg协同A549-RNA-DC疫苗诱导抗肿瘤免疫应答作用的研究

国家自然科学基金

0+阅读 · 2011年12月31日

多天线OFDM信道全信息压缩估计理论与方法

国家自然科学基金

0+阅读 · 2011年12月31日

艾滋病TH17/Treg失衡与STAT/SOCS调控及补肾解毒法的干预作用

国家自然科学基金

0+阅读 · 2011年12月31日

Age Optimal Sampling Under Unknown Delay Statistics

Age Optimal Sampling Under Unknown Delay Statistics

Arxiv

0+阅读 · 2022年4月20日

Memory-Constrained Policy Optimization

Arxiv

0+阅读 · 2022年4月20日

Making Progress Based on False Discoveries

Arxiv

0+阅读 · 2022年4月19日

Equivalence Analysis between Counterfactual Regret Minimization and Online Mirror Descent

Arxiv

1+阅读 · 2022年4月19日

Linear codes using simplicial complexes

Arxiv

1+阅读 · 2022年4月18日

Stochastic Gradient Descent on Separable Data: Exact Convergence with a Fixed Learning Rate

Stochastic Gradient Descent on Separable Data: Exact Convergence with a Fixed Learning Rate

Arxiv

1+阅读 · 2022年4月18日

Risk and optimal policies in bandit experiments

Risk and optimal policies in bandit experiments

Arxiv

0+阅读 · 2022年4月18日

Improved Convergence Rate of Stochastic Gradient Langevin Dynamics with Variance Reduction and its Application to Optimization

Arxiv

0+阅读 · 2022年4月16日

Optimal Estimation of Off-Policy Policy Gradient via Double Fitted Iteration

Arxiv

0+阅读 · 2022年4月15日

Alternating Mahalanobis Distance Minimization for Stable and Accurate CP Decomposition

Arxiv

0+阅读 · 2022年4月14日

VIP会员

文章信息

相关主题

样本复杂度

相关VIP内容

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

53+阅读 · 2020年9月7日

最新《非光滑优化》十讲硬核课程，剑桥大学梁经纬博士主讲

最新《非光滑优化》十讲硬核课程，剑桥大学梁经纬博士主讲

专知会员服务

33+阅读 · 2020年8月14日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【博士论文】低维与高维空间中潜在表征的分析、建模与变换

《生态建模密码破译：建模与编程实践》美陆军最新报告

大模型解决方案白皮书：社交陪伴场景全流程落地指南

面向具身操作的视觉-语言-动作模型综述

相关资讯

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

中国图象图形学学会CSIG

1+阅读 · 2021年11月11日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Age Optimal Sampling Under Unknown Delay Statistics

Age Optimal Sampling Under Unknown Delay Statistics

Arxiv

0+阅读 · 2022年4月20日

Memory-Constrained Policy Optimization

Arxiv

0+阅读 · 2022年4月20日

Making Progress Based on False Discoveries

Arxiv

0+阅读 · 2022年4月19日

Equivalence Analysis between Counterfactual Regret Minimization and Online Mirror Descent

Arxiv

1+阅读 · 2022年4月19日

Linear codes using simplicial complexes

Arxiv

1+阅读 · 2022年4月18日

Stochastic Gradient Descent on Separable Data: Exact Convergence with a Fixed Learning Rate

Stochastic Gradient Descent on Separable Data: Exact Convergence with a Fixed Learning Rate

Arxiv

1+阅读 · 2022年4月18日

Risk and optimal policies in bandit experiments

Risk and optimal policies in bandit experiments

Arxiv

0+阅读 · 2022年4月18日

Improved Convergence Rate of Stochastic Gradient Langevin Dynamics with Variance Reduction and its Application to Optimization

Arxiv

0+阅读 · 2022年4月16日

Optimal Estimation of Off-Policy Policy Gradient via Double Fitted Iteration

Arxiv

0+阅读 · 2022年4月15日

Alternating Mahalanobis Distance Minimization for Stable and Accurate CP Decomposition

Arxiv

0+阅读 · 2022年4月14日

相关基金

分数阶微分-代数方程的高精度数值算法

国家自然科学基金

0+阅读 · 2014年12月31日

动态Gr？bner 基与GVW算法

国家自然科学基金

0+阅读 · 2014年12月31日

水平油水两相流超声/电学阵列传感器融合测试方法

国家自然科学基金

1+阅读 · 2014年12月31日

连续时间马氏决策过程均值-方差优化问题的研究

国家自然科学基金

0+阅读 · 2012年12月31日

低秩矩阵复原的Schatten-q(0<q<1)正则化理论与算法研究

国家自然科学基金

1+阅读 · 2012年12月31日

PTTG1基因在大肠癌中的表达及其临床意义

国家自然科学基金

0+阅读 · 2012年12月31日

调肝降脂方对CYP7A1信号通路调控作用机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

删除CD4+CD25+Treg协同A549-RNA-DC疫苗诱导抗肿瘤免疫应答作用的研究

国家自然科学基金

0+阅读 · 2011年12月31日

多天线OFDM信道全信息压缩估计理论与方法

国家自然科学基金

0+阅读 · 2011年12月31日

艾滋病TH17/Treg失衡与STAT/SOCS调控及补肾解毒法的干预作用

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员