优化政策优化,以可预见快速全球趋同方式对多边发展方案进行约束 (Policy Optimization for Constrained MDPs with Provable Fast Global Convergence) - 专知论文

会员服务 ·

0

优化器 · Performer · Extensibility · FAST · 约束 ·

2022 年 2 月 3 日

Policy Optimization for Constrained MDPs with Provable Fast Global Convergence

翻译：优化政策优化,以可预见快速全球趋同方式对多边发展方案进行约束

Tao Liu,Ruida Zhou,Dileep Kalathil,P. R. Kumar,Chao Tian

We address the problem of finding the optimal policy of a constrained Markov decision process (CMDP) using a gradient descent-based algorithm. Previous results have shown that a primal-dual approach can achieve an $\mathcal{O}(1/\sqrt{T})$ global convergence rate for both the optimality gap and the constraint violation. We propose a new algorithm called policy mirror descent-primal dual (PMD-PD) algorithm that can provably achieve a faster $\mathcal{O}(\log(T)/T)$ convergence rate for both the optimality gap and the constraint violation. For the primal (policy) update, the PMD-PD algorithm utilizes a modified value function and performs natural policy gradient steps, which is equivalent to a mirror descent step with appropriate regularization. For the dual update, the PMD-PD algorithm uses modified Lagrange multipliers to ensure a faster convergence rate. We also present two extensions of this approach to the settings with zero constraint violation and sample-based estimation. Experimental results demonstrate the faster convergence rate and the better performance of the PMD-PD algorithm compared with existing policy gradient-based algorithms.

翻译：我们用一种基于梯度的下行算法解决找到一个限制的Markov决定程序的最佳政策的问题。先前的结果显示, 原始双向方法可以达到美元=mathcal{O} (1/\\\ sqrt{T}) 美元=全球最佳差值和限制违反情况的全球趋同率。我们建议采用一种新的算法, 称为政策镜下位双向双向( PMD- PD) 算法( PMD- PD), 它可以更快地达到美元=mathcal{O}(log(T)/ T) $=最佳差值和限制违反情况之间的趋同率。在原始( 政策) 更新方面, PMD- PD 算法使用经修改的拉格兰特乘法, 以确保更快的趋同率。我们还将这种方法的两个延伸至零限制违规和抽样估计情况。实验结果显示PMD- PD算法的趋同比现行政策梯值更快的趋同率和更好的表现。

0

相关内容

优化器

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

53+阅读 · 2020年9月7日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【ICCV 2019 Toturial】Global Optimization for Geometric Understanding with Provable Guarantees（具有可证明保证的几何理解的全局优化）

【ICCV 2019 Toturial】Global Optimization for Geometric Understanding with Provable Guarantees（具有可证明保证的几何理解的全局优化）

专知会员服务

18+阅读 · 2019年11月1日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

OpenAI丨深度强化学习关键论文列表

OpenAI丨深度强化学习关键论文列表

中国人工智能学会

17+阅读 · 2018年11月10日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

变工况机械动态信号瞬时耦合的理解、识别与故障预示

国家自然科学基金

2+阅读 · 2015年12月31日

基于结构约束的多模态学习理论和方法

国家自然科学基金

6+阅读 · 2014年12月31日

基于混合Petri网的电力CPS协同建模与分析

国家自然科学基金

2+阅读 · 2013年12月31日

基于空腔方法的随机约束满足问题相变复杂性与高效算法研究

国家自然科学基金

0+阅读 · 2013年12月31日

控制系统的约束矩阵方程及其高效数值算法

国家自然科学基金

0+阅读 · 2013年12月31日

X2MnZ基Heusler合金磁和相稳定性及力学性能的理论研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于定理证明的多核并行程序验证

国家自然科学基金

0+阅读 · 2012年12月31日

稀土掺杂对Co基Heusler合金磁性和费米能级的调控

国家自然科学基金

0+阅读 · 2011年12月31日

进化规划算法的计算时间难题研究

国家自然科学基金

0+阅读 · 2010年12月31日

启发式算法设计中的骨架分析与应用

国家自然科学基金

0+阅读 · 2008年12月31日

Memory-Constrained Policy Optimization

Arxiv

0+阅读 · 2022年4月20日

Near-optimal Policy Optimization Algorithms for Learning Adversarial Linear Mixture MDPs

Arxiv

0+阅读 · 2022年4月20日

Faster Perturbed Stochastic Gradient Methods for Finding Local Minima

Arxiv

0+阅读 · 2022年4月20日

Tight Last-Iterate Convergence of the Extragradient Method for Constrained Monotone Variational Inequalities

Arxiv

0+阅读 · 2022年4月20日

COptiDICE: Offline Constrained Reinforcement Learning via Stationary Distribution Correction Estimation

Arxiv

0+阅读 · 2022年4月19日

Risk and optimal policies in bandit experiments

Risk and optimal policies in bandit experiments

Arxiv

0+阅读 · 2022年4月18日

Improved Convergence Rate of Stochastic Gradient Langevin Dynamics with Variance Reduction and its Application to Optimization

Arxiv

0+阅读 · 2022年4月16日

Resource-Constrained Neural Architecture Search on Tabular Datasets

Arxiv

0+阅读 · 2022年4月15日

Singular quadratic eigenvalue problems: Linearization and weak condition numbers

Arxiv

0+阅读 · 2022年4月15日

Fast Sparse Decision Tree Optimization via Reference Ensembles

Arxiv

0+阅读 · 2022年4月14日

VIP会员

文章信息

相关主题

相关VIP内容

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

53+阅读 · 2020年9月7日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【ICCV 2019 Toturial】Global Optimization for Geometric Understanding with Provable Guarantees（具有可证明保证的几何理解的全局优化）

【ICCV 2019 Toturial】Global Optimization for Geometric Understanding with Provable Guarantees（具有可证明保证的几何理解的全局优化）

专知会员服务

18+阅读 · 2019年11月1日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

GPT-5如何对齐？从硬性拒绝到安全完成：走向以输出为中心的安全训练

【伯克利博士论文】超越人类监督的视觉智能

【ICCV2025】SO(3) 上连续非保守动力系统的预测

2025年中国数据要素行业发展研究报告

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

OpenAI丨深度强化学习关键论文列表

OpenAI丨深度强化学习关键论文列表

中国人工智能学会

17+阅读 · 2018年11月10日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Memory-Constrained Policy Optimization

Arxiv

0+阅读 · 2022年4月20日

Near-optimal Policy Optimization Algorithms for Learning Adversarial Linear Mixture MDPs

Arxiv

0+阅读 · 2022年4月20日

Faster Perturbed Stochastic Gradient Methods for Finding Local Minima

Arxiv

0+阅读 · 2022年4月20日

Tight Last-Iterate Convergence of the Extragradient Method for Constrained Monotone Variational Inequalities

Arxiv

0+阅读 · 2022年4月20日

COptiDICE: Offline Constrained Reinforcement Learning via Stationary Distribution Correction Estimation

Arxiv

0+阅读 · 2022年4月19日

Risk and optimal policies in bandit experiments

Risk and optimal policies in bandit experiments

Arxiv

0+阅读 · 2022年4月18日

Improved Convergence Rate of Stochastic Gradient Langevin Dynamics with Variance Reduction and its Application to Optimization

Arxiv

0+阅读 · 2022年4月16日

Resource-Constrained Neural Architecture Search on Tabular Datasets

Arxiv

0+阅读 · 2022年4月15日

Singular quadratic eigenvalue problems: Linearization and weak condition numbers

Arxiv

0+阅读 · 2022年4月15日

Fast Sparse Decision Tree Optimization via Reference Ensembles

Arxiv

0+阅读 · 2022年4月14日

相关基金

变工况机械动态信号瞬时耦合的理解、识别与故障预示

国家自然科学基金

2+阅读 · 2015年12月31日

基于结构约束的多模态学习理论和方法

国家自然科学基金

6+阅读 · 2014年12月31日

基于混合Petri网的电力CPS协同建模与分析

国家自然科学基金

2+阅读 · 2013年12月31日

基于空腔方法的随机约束满足问题相变复杂性与高效算法研究

国家自然科学基金

0+阅读 · 2013年12月31日

控制系统的约束矩阵方程及其高效数值算法

国家自然科学基金

0+阅读 · 2013年12月31日

X2MnZ基Heusler合金磁和相稳定性及力学性能的理论研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于定理证明的多核并行程序验证

国家自然科学基金

0+阅读 · 2012年12月31日

稀土掺杂对Co基Heusler合金磁性和费米能级的调控

国家自然科学基金

0+阅读 · 2011年12月31日

进化规划算法的计算时间难题研究

国家自然科学基金

0+阅读 · 2010年12月31日

启发式算法设计中的骨架分析与应用

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员