单式政策源:政策趋同、隐性规范化和改进样本复杂性 (Homotopic Policy Mirror Descent: Policy Convergence, Implicit Regularization, and Improved Sample Complexity) - 专知论文

会员服务 ·

0

样本复杂度 · 优化器 · 正则化 · 样本 · Continuity ·

2022 年 5 月 31 日

Homotopic Policy Mirror Descent: Policy Convergence, Implicit Regularization, and Improved Sample Complexity

翻译：单式政策源:政策趋同、隐性规范化和改进样本复杂性

Yan Li,Tuo Zhao,Guanghui Lan

from arxiv, presentation and proof structure reorganized

We propose the homotopic policy mirror descent (HPMD) method for solving discounted, infinite horizon MDPs with finite state and action space, and study its policy convergence. We report several properties that seem to be new in the literature of policy gradient methods: (1) HPMD exhibits global linear convergence of the value optimality gap, and local superlinear convergence of the policy to the set of optimal policies with order $\gamma^{-2}$. The superlinear convergence of the policy takes effect after no more than $\mathcal{O}(\log(1/\Delta^*))$ number of iterations, where $\Delta^*$ is defined via a gap quantity associated with the optimal state-action value function; (2) HPMD also exhibits last-iterate convergence of the policy, with the limiting policy corresponding exactly to the optimal policy with the maximal entropy for every state. No regularization is added to the optimization objective and hence the second observation arises solely as an algorithmic property of the homotopic policy gradient method; (3) The last-iterate convergence of HPMD holds for a much broader class of decomposable distance-generating functions, including the $p$-th power of $\ell_p$-norm and the negative Tsallis entropy. As a byproduct of the analysis, we also discover the finite-time exact convergence of HPMD with these divergences, and show that HPMD continues converging to the limiting policy even if the current policy is already optimal; (4) For the stochastic HPMD method, we further demonstrate that a better than $\mathcal{O}(|\mathcal{S}| |\mathcal{A}| / \epsilon^2)$ sample complexity for small optimality gap $\epsilon$ holds with high probability, when assuming a generative model for policy evaluation.

翻译：我们建议采用同质政策镜底(HPMD)方法来解决有限制的状态和行动空间的折价、无限地平地 MDP(HPMD)方法,并研究其政策趋同。我们报告了一些在政策梯度方法文献中似乎是新的属性:(1) HPMD展示了价值最佳差值的全球线性趋同,以及该政策与一套最佳政策在当地的超线性趋同,以$\gammacha=2美元为主。没有将政策与优化目标相加,因此,该政策的超级线性趋同作用在不超过美元(mathcal{O}(\log(1/Delta ⁇ ))美元乘以迭代数的迭代数后才会生效,其中,$Delta $已经通过与最佳州-行动值值相挂钩的差额来界定。 (2) HPMDMD(Delta)还展示了政策的最后一级趋同性趋同性, 也就是以美元平价变现的变现政策函数,包括变现的变现的变现的变现的变现的变变变变变变变的变的变的变的变的变变的变的变的变的变的变现。

0

相关内容

样本复杂度

样本复杂度

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

【硬核书】矩阵代数基础，248页pdf

【硬核书】矩阵代数基础，248页pdf

专知会员服务

88+阅读 · 2021年12月9日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

征稿 | International Joint Conference on Knowledge Graphs (IJCKG)

征稿 | International Joint Conference on Knowledge Graphs (IJCKG)

开放知识图谱

2+阅读 · 2022年5月20日

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

开放知识图谱

1+阅读 · 2022年4月4日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新七篇强化学习相关论文—逻辑约束、综述、多任务深度强化学习、参数服务器、事件抽取、分层强化学习、过拟合研究

【论文推荐】最新七篇强化学习相关论文—逻辑约束、综述、多任务深度强化学习、参数服务器、事件抽取、分层强化学习、过拟合研究

专知

25+阅读 · 2018年4月29日

AphB互作蛋白的筛选及其对霍乱弧菌毒力因子调控机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

Blimp-1对小鼠allo-HSCT后GVHD发病的调控作用和机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

蓖麻矮化相关RcDof基因功能分析及调控机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

NSCs、BMSCs移植治疗锰中毒大鼠多巴胺能神经损伤分子机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

意大利蜜蜂级型分化关键基因Dnmt3启动子的分析及其上游转录调控因子的鉴定

国家自然科学基金

0+阅读 · 2013年12月31日

Kronheimer-Nakajima quiver 模空间与有理曲面

国家自然科学基金

1+阅读 · 2013年12月31日

纠缠及纠缠之外的量子关联刻画

国家自然科学基金

0+阅读 · 2013年12月31日

PGC-1α调节骨骼肌脂肪酸代谢和胰岛素抵抗的分子机制

国家自然科学基金

0+阅读 · 2012年12月31日

结核分枝杆菌北京基因型进化及其与耐药结核病传播关系研究

国家自然科学基金

0+阅读 · 2011年12月31日

prfA基因突变对单核细胞增生性李氏杆菌毒力及免疫原性的影响

国家自然科学基金

0+阅读 · 2009年12月31日

Active Exploration for Inverse Reinforcement Learning

Arxiv

0+阅读 · 2022年7月18日

Fast Convergence of Optimistic Gradient Ascent in Network Zero-Sum Extensive Form Games

Arxiv

0+阅读 · 2022年7月18日

A Hybrid High-Order scheme for the stationary, incompressible magnetohydrodynamics equations

Arxiv

0+阅读 · 2022年7月18日

A Few Expert Queries Suffices for Sample-Efficient RL with Resets and Linear Value Approximation

Arxiv

0+阅读 · 2022年7月18日

Policy Mirror Descent for Regularized Reinforcement Learning: A Generalized Framework with Linear Convergence

Arxiv

0+阅读 · 2022年7月18日

Fully computable a posteriori error bounds for eigenfunctions

Arxiv

0+阅读 · 2022年7月16日

Policy Optimization for Markov Games: Unified Framework and Faster Convergence

Arxiv

0+阅读 · 2022年7月15日

Sparse solutions of the kernel herding algorithm by improved gradient approximation

Arxiv

0+阅读 · 2022年7月15日

Kernel Conjugate Gradient Methods with Random Projections

Arxiv

0+阅读 · 2022年7月15日

Stochastic mirror descent method for linear ill-posed problems in Banach spaces

Arxiv

0+阅读 · 2022年7月14日

VIP会员

文章信息

相关主题

样本复杂度

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

【硬核书】矩阵代数基础，248页pdf

【硬核书】矩阵代数基础，248页pdf

专知会员服务

88+阅读 · 2021年12月9日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

操作系统智能体：基于多模态大模型（MLLM）的通用计算设备智能体综述

《美国太空军系统全生命周期建模、仿真与分析效能提升方案》最新84页报告

【博士论文】推进数据高效的深度学习：非参数 Transformer、主动测试与上下文学习

自主人工智能：未来战争是否将是自主化的？

相关资讯

征稿 | International Joint Conference on Knowledge Graphs (IJCKG)

征稿 | International Joint Conference on Knowledge Graphs (IJCKG)

开放知识图谱

2+阅读 · 2022年5月20日

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

开放知识图谱

1+阅读 · 2022年4月4日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新七篇强化学习相关论文—逻辑约束、综述、多任务深度强化学习、参数服务器、事件抽取、分层强化学习、过拟合研究

【论文推荐】最新七篇强化学习相关论文—逻辑约束、综述、多任务深度强化学习、参数服务器、事件抽取、分层强化学习、过拟合研究

专知

25+阅读 · 2018年4月29日

相关论文

Active Exploration for Inverse Reinforcement Learning

Arxiv

0+阅读 · 2022年7月18日

Fast Convergence of Optimistic Gradient Ascent in Network Zero-Sum Extensive Form Games

Arxiv

0+阅读 · 2022年7月18日

A Hybrid High-Order scheme for the stationary, incompressible magnetohydrodynamics equations

Arxiv

0+阅读 · 2022年7月18日

A Few Expert Queries Suffices for Sample-Efficient RL with Resets and Linear Value Approximation

Arxiv

0+阅读 · 2022年7月18日

Policy Mirror Descent for Regularized Reinforcement Learning: A Generalized Framework with Linear Convergence

Arxiv

0+阅读 · 2022年7月18日

Fully computable a posteriori error bounds for eigenfunctions

Arxiv

0+阅读 · 2022年7月16日

Policy Optimization for Markov Games: Unified Framework and Faster Convergence

Arxiv

0+阅读 · 2022年7月15日

Sparse solutions of the kernel herding algorithm by improved gradient approximation

Arxiv

0+阅读 · 2022年7月15日

Kernel Conjugate Gradient Methods with Random Projections

Arxiv

0+阅读 · 2022年7月15日

Stochastic mirror descent method for linear ill-posed problems in Banach spaces

Arxiv

0+阅读 · 2022年7月14日

相关基金

AphB互作蛋白的筛选及其对霍乱弧菌毒力因子调控机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

Blimp-1对小鼠allo-HSCT后GVHD发病的调控作用和机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

蓖麻矮化相关RcDof基因功能分析及调控机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

NSCs、BMSCs移植治疗锰中毒大鼠多巴胺能神经损伤分子机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

意大利蜜蜂级型分化关键基因Dnmt3启动子的分析及其上游转录调控因子的鉴定

国家自然科学基金

0+阅读 · 2013年12月31日

Kronheimer-Nakajima quiver 模空间与有理曲面

国家自然科学基金

1+阅读 · 2013年12月31日

纠缠及纠缠之外的量子关联刻画

国家自然科学基金

0+阅读 · 2013年12月31日

PGC-1α调节骨骼肌脂肪酸代谢和胰岛素抵抗的分子机制

国家自然科学基金

0+阅读 · 2012年12月31日

结核分枝杆菌北京基因型进化及其与耐药结核病传播关系研究

国家自然科学基金

0+阅读 · 2011年12月31日

prfA基因突变对单核细胞增生性李氏杆菌毒力及免疫原性的影响

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员