政策梯度方法的线性趋同,以对付定地连续时间变化控制问题 (Linear convergence of a policy gradient method for finite horizon continuous time stochastic control problems) - 专知论文

会员服务 ·

0

Continuity · 控制器 · 正则化项 · 线性的 · 衰减系数 ·

2022 年 4 月 21 日

Linear convergence of a policy gradient method for finite horizon continuous time stochastic control problems

翻译：政策梯度方法的线性趋同,以对付定地连续时间变化控制问题

Christoph Reisinger,Wolfgang Stockinger,Yufei Zhang

from arxiv, Extend results to discounted problems

Despite its popularity in the reinforcement learning community, a provably convergent policy gradient method for general continuous space-time stochastic control problems has been elusive. This paper closes the gap by proposing a proximal gradient algorithm for feedback controls of finite-time horizon stochastic control problems. The state dynamics are continuous time nonlinear diffusions with controlled drift and possibly degenerate noise, and the objectives are nonconvex in the state and nonsmooth in the control. We prove under suitable conditions that the algorithm converges linearly to a stationary point of the control problem, and is stable with respect to policy updates by approximate gradient steps. The convergence result justifies the recent reinforcement learning heuristics that adding entropy regularization or a fictitious discount factor to the optimization objective accelerates the convergence of policy gradient methods. The proof exploits careful regularity estimates of backward stochastic differential equations.

翻译：尽管在强化学习界很受欢迎,但对于一般连续空间时间的随机控制问题,一种可察觉的趋同政策梯度方法一直难以找到。本文件通过提出一种对有限时间-地平线随机控制问题进行反馈控制的最接近的梯度算法来弥补差距。状态动态是持续的时间非线性扩散,有受控的漂移和可能退化的噪音,目标在状态中是非线性扩散,在控制状态中是非线性扩散。我们证明,在适当条件下,算法线性集中到控制问题的固定点,并且以近似梯度步骤更新政策是稳定的。趋同结果证明,最近强化的超常性学习将诱变规范化或假折扣因素加入优化目标加速了政策梯度方法的趋同。证据利用了对后相相切差异方程式的审慎定期估计。

0

相关内容

Continuity

让 iOS 8 和 OS X Yosemite 无缝切换的一个新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source: Apple - iOS 8

【2022新书】强化学习工业应用，408页pdf

【2022新书】强化学习工业应用，408页pdf

专知会员服务

231+阅读 · 2022年2月3日

MIT经典《线性代数》，584页pdf，Introduction to Linear Algebra, Fifth Edition, Gilbert Strang, 2016.

MIT经典《线性代数》，584页pdf，Introduction to Linear Algebra, Fifth Edition, Gilbert Strang, 2016.

专知会员服务

426+阅读 · 2021年1月11日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

53+阅读 · 2020年9月7日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

原位自组装金属有机骨架纳米杂化多层膜的基础研究

国家自然科学基金

0+阅读 · 2013年12月31日

合成气制乙二醇催化剂结构调控与反应-扩散耦合机制

国家自然科学基金

0+阅读 · 2013年12月31日

可压缩Navier-Stokes方程和Boltzmann方程解的渐近行为

国家自然科学基金

0+阅读 · 2013年12月31日

Vlasov-Poisson-Boltzmann方程研究

国家自然科学基金

0+阅读 · 2013年12月31日

金属-有机骨架化合物（MOFs）的手性后合成修饰及不对称催化研究

国家自然科学基金

0+阅读 · 2012年12月31日

一种时空白噪声驱动的Navier-Stokes方程的隐格式

国家自然科学基金

0+阅读 · 2011年12月31日

轴对称的Navier-Stokes方程

国家自然科学基金

1+阅读 · 2011年12月31日

超分子模板方法设计与合成微-介孔多级孔道金属-有机骨架材料及其催化反应动力学研究

国家自然科学基金

0+阅读 · 2009年12月31日

可压Navier-Stokes方程及相关流体动力学方程研究

国家自然科学基金

0+阅读 · 2008年12月31日

Gradient Descent on Neurons and its Link to Approximate Second-Order Optimization

Arxiv

0+阅读 · 2022年6月10日

On Convergence of FedProx: Local Dissimilarity Invariant Bounds, Non-smoothness and Beyond

Arxiv

0+阅读 · 2022年6月10日

Stochastic Continuous Submodular Maximization: Boosting via Non-oblivious Function

Arxiv

0+阅读 · 2022年6月10日

Regret Analysis of Certainty Equivalence Policies in Continuous-Time Linear-Quadratic Systems

Regret Analysis of Certainty Equivalence Policies in Continuous-Time Linear-Quadratic Systems

Arxiv

0+阅读 · 2022年6月9日

On Gradient Descent Convergence beyond the Edge of Stability

Arxiv

0+阅读 · 2022年6月8日

On the Global Convergence of Particle Swarm Optimization Methods

Arxiv

0+阅读 · 2022年6月8日

High-dimensional limit theorems for SGD: Effective dynamics and critical scaling

High-dimensional limit theorems for SGD: Effective dynamics and critical scaling

Arxiv

0+阅读 · 2022年6月8日

A Study of Continual Learning Methods for Q-Learning

A Study of Continual Learning Methods for Q-Learning

Arxiv

0+阅读 · 2022年6月8日

A Unified Convergence Theorem for Stochastic Optimization Methods

Arxiv

0+阅读 · 2022年6月8日

Collaborative Linear Bandits with Adversarial Agents: Near-Optimal Regret Bounds

Arxiv

0+阅读 · 2022年6月6日

VIP会员

文章信息

相关主题

相关VIP内容

【2022新书】强化学习工业应用，408页pdf

【2022新书】强化学习工业应用，408页pdf

专知会员服务

231+阅读 · 2022年2月3日

MIT经典《线性代数》，584页pdf，Introduction to Linear Algebra, Fifth Edition, Gilbert Strang, 2016.

MIT经典《线性代数》，584页pdf，Introduction to Linear Algebra, Fifth Edition, Gilbert Strang, 2016.

专知会员服务

426+阅读 · 2021年1月11日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

53+阅读 · 2020年9月7日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【博士论文】扩展可扩展会话推荐的边界

别想太多：高效 R1 风格大型推理模型综述

【ACMMM2025】EvoVLMA: 进化式视觉-语言模型自适应

智能体网络：用AI智能体编织下一代网络

相关资讯

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

相关论文

Gradient Descent on Neurons and its Link to Approximate Second-Order Optimization

Arxiv

0+阅读 · 2022年6月10日

On Convergence of FedProx: Local Dissimilarity Invariant Bounds, Non-smoothness and Beyond

Arxiv

0+阅读 · 2022年6月10日

Stochastic Continuous Submodular Maximization: Boosting via Non-oblivious Function

Arxiv

0+阅读 · 2022年6月10日

Regret Analysis of Certainty Equivalence Policies in Continuous-Time Linear-Quadratic Systems

Regret Analysis of Certainty Equivalence Policies in Continuous-Time Linear-Quadratic Systems

Arxiv

0+阅读 · 2022年6月9日

On Gradient Descent Convergence beyond the Edge of Stability

Arxiv

0+阅读 · 2022年6月8日

On the Global Convergence of Particle Swarm Optimization Methods

Arxiv

0+阅读 · 2022年6月8日

High-dimensional limit theorems for SGD: Effective dynamics and critical scaling

High-dimensional limit theorems for SGD: Effective dynamics and critical scaling

Arxiv

0+阅读 · 2022年6月8日

A Study of Continual Learning Methods for Q-Learning

A Study of Continual Learning Methods for Q-Learning

Arxiv

0+阅读 · 2022年6月8日

A Unified Convergence Theorem for Stochastic Optimization Methods

Arxiv

0+阅读 · 2022年6月8日

Collaborative Linear Bandits with Adversarial Agents: Near-Optimal Regret Bounds

Arxiv

0+阅读 · 2022年6月6日

相关基金

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

原位自组装金属有机骨架纳米杂化多层膜的基础研究

国家自然科学基金

0+阅读 · 2013年12月31日

合成气制乙二醇催化剂结构调控与反应-扩散耦合机制

国家自然科学基金

0+阅读 · 2013年12月31日

可压缩Navier-Stokes方程和Boltzmann方程解的渐近行为

国家自然科学基金

0+阅读 · 2013年12月31日

Vlasov-Poisson-Boltzmann方程研究

国家自然科学基金

0+阅读 · 2013年12月31日

金属-有机骨架化合物（MOFs）的手性后合成修饰及不对称催化研究

国家自然科学基金

0+阅读 · 2012年12月31日

一种时空白噪声驱动的Navier-Stokes方程的隐格式

国家自然科学基金

0+阅读 · 2011年12月31日

轴对称的Navier-Stokes方程

国家自然科学基金

1+阅读 · 2011年12月31日

超分子模板方法设计与合成微-介孔多级孔道金属-有机骨架材料及其催化反应动力学研究

国家自然科学基金

0+阅读 · 2009年12月31日

可压Navier-Stokes方程及相关流体动力学方程研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员