放松管制的反馈的合规性和稳定性 (Regularity and stability of feedback relaxed controls) - 专知论文

会员服务 ·

0

正则化项 · 控制器 · 价值函数 · 稳健性 · 泛函 ·

2021 年 7 月 23 日

Regularity and stability of feedback relaxed controls

翻译：放松管制的反馈的合规性和稳定性

Christoph Reisinger,Yufei Zhang

from arxiv, Additional comments have been included, such that the importance of stable feedback controls for reinforcement learning. The manuscript will be published in SIAM Journal on Control and Optimization

This paper proposes a relaxed control regularization with general exploration rewards to design robust feedback controls for multi-dimensional continuous-time stochastic exit time problems. We establish that the regularized control problem admits a H\"{o}lder continuous feedback control, and demonstrate that both the value function and the feedback control of the regularized control problem are Lipschitz stable with respect to parameter perturbations. Moreover, we show that a pre-computed feedback relaxed control has a robust performance in a perturbed system, and derive a first-order sensitivity equation for both the value function and optimal feedback relaxed control. These stability results provide a theoretical justification for recent reinforcement learning heuristics that including an exploration reward in the optimization objective leads to more robust decision making. We finally prove first-order monotone convergence of the value functions for relaxed control problems with vanishing exploration parameters, which subsequently enables us to construct the pure exploitation strategy of the original control problem based on the feedback relaxed controls.

翻译：本文建议放松控制规范, 并给予总体勘探奖励, 以设计对多维连续时间随机退出时间问题的强力反馈控制。我们确定常规化控制问题认可了 H\"{o}lder 持续反馈控制, 并证明常规化控制问题的价值功能和反馈控制在参数扰动方面都是稳定的。此外, 我们显示, 预先计算好的反馈放松控制在受扰动的系统中表现良好, 并且为价值功能和最佳反馈放松控制得出了一阶敏感度方程式。这些稳定性结果为最近的强化学习偏重提供了理论依据, 包括优化目标中的勘探奖励导致更强有力的决策。我们最终证明, 放松控制问题的价值功能与消失的勘探参数相匹配, 从而使我们能够根据反馈宽松控制建立原始控制问题的纯利用战略。

0

相关内容

正则化项

【重磅推荐】清华大学李升波老师《强化学习与控制》

专知会员服务

100+阅读 · 2021年7月11日

NeurIPS 2020接收论文列表发布，1900篇论文都在这了！

专知会员服务

114+阅读 · 2020年10月8日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

111+阅读 · 2020年5月15日

【国防科大】复杂异构数据的表征学习综述

【国防科大】复杂异构数据的表征学习综述

专知会员服务

85+阅读 · 2020年4月23日

【反馈循环自编码器】FEEDBACK RECURRENT AUTOENCODER

【反馈循环自编码器】FEEDBACK RECURRENT AUTOENCODER

专知会员服务

23+阅读 · 2020年1月28日

新书分享：强化学习最新书稿《强化学习导论》（Reinforcement Learning An Introduction）第二版出炉

新书分享：强化学习最新书稿《强化学习导论》（Reinforcement Learning An Introduction）第二版出炉

专知会员服务

118+阅读 · 2019年10月25日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

CCF A类 | 顶级会议RTSS 2019诚邀稿件

CCF A类 | 顶级会议RTSS 2019诚邀稿件

Call4Papers

10+阅读 · 2019年4月17日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

【论文推荐】最新七篇强化学习相关论文—逻辑约束、综述、多任务深度强化学习、参数服务器、事件抽取、分层强化学习、过拟合研究

【论文推荐】最新七篇强化学习相关论文—逻辑约束、综述、多任务深度强化学习、参数服务器、事件抽取、分层强化学习、过拟合研究

专知

25+阅读 · 2018年4月29日

【强化学习】NIPS的最佳论文强化学习Value iteration Network 及代码；目前深度学习和增强学习交叉应用最火

【强化学习】NIPS的最佳论文强化学习Value iteration Network 及代码；目前深度学习和增强学习交叉应用最火

产业智能官

6+阅读 · 2017年9月1日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

Approximations of Piecewise Deterministic Markov Processes and their convergence properties

Arxiv

0+阅读 · 2021年9月24日

Parameter-Free Deterministic Reduction of the Estimation Bias in Continuous Control

Arxiv

0+阅读 · 2021年9月24日

Stability analysis and control of decision-making of miners in blockchain

Arxiv

0+阅读 · 2021年9月24日

Coupled diffusion-reaction systems with ODE controlled interfaces: existence, uniqueness and an implicit-explicit finite element scheme

Arxiv

0+阅读 · 2021年9月23日

Optimal regularity of extended mean field controls and their piecewise constant approximation

Arxiv

0+阅读 · 2021年9月23日

Connectivity constrains quantum codes

Arxiv

0+阅读 · 2021年9月22日

Differential Dynamic Programming Neural Optimizer

Arxiv

7+阅读 · 2020年6月29日

Can Neural Machine Translation be Improved with User Feedback?

Arxiv

3+阅读 · 2018年4月16日

Not-So-CLEVR: Visual Relations Strain Feedforward Neural Networks

Arxiv

6+阅读 · 2018年2月12日

Safety-aware Adaptive Reinforcement Learning with Applications to Brushbot Navigation

Arxiv

4+阅读 · 2018年1月29日

VIP会员

文章信息

相关主题

相关VIP内容

【重磅推荐】清华大学李升波老师《强化学习与控制》

专知会员服务

100+阅读 · 2021年7月11日

NeurIPS 2020接收论文列表发布，1900篇论文都在这了！

专知会员服务

114+阅读 · 2020年10月8日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

111+阅读 · 2020年5月15日

【国防科大】复杂异构数据的表征学习综述

【国防科大】复杂异构数据的表征学习综述

专知会员服务

85+阅读 · 2020年4月23日

【反馈循环自编码器】FEEDBACK RECURRENT AUTOENCODER

【反馈循环自编码器】FEEDBACK RECURRENT AUTOENCODER

专知会员服务

23+阅读 · 2020年1月28日

新书分享：强化学习最新书稿《强化学习导论》（Reinforcement Learning An Introduction）第二版出炉

新书分享：强化学习最新书稿《强化学习导论》（Reinforcement Learning An Introduction）第二版出炉

专知会员服务

118+阅读 · 2019年10月25日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

人工智能治理的未来

模态感知的特征匹配：单一模态与跨模态技术的全面综述

无监督行人重识别研究综述

【牛津博士论文】面向神经影像应用的可扩展且可解释的空间模型

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

CCF A类 | 顶级会议RTSS 2019诚邀稿件

CCF A类 | 顶级会议RTSS 2019诚邀稿件

Call4Papers

10+阅读 · 2019年4月17日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

【论文推荐】最新七篇强化学习相关论文—逻辑约束、综述、多任务深度强化学习、参数服务器、事件抽取、分层强化学习、过拟合研究

【论文推荐】最新七篇强化学习相关论文—逻辑约束、综述、多任务深度强化学习、参数服务器、事件抽取、分层强化学习、过拟合研究

专知

25+阅读 · 2018年4月29日

【强化学习】NIPS的最佳论文强化学习Value iteration Network 及代码；目前深度学习和增强学习交叉应用最火

【强化学习】NIPS的最佳论文强化学习Value iteration Network 及代码；目前深度学习和增强学习交叉应用最火

产业智能官

6+阅读 · 2017年9月1日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

Approximations of Piecewise Deterministic Markov Processes and their convergence properties

Arxiv

0+阅读 · 2021年9月24日

Parameter-Free Deterministic Reduction of the Estimation Bias in Continuous Control

Arxiv

0+阅读 · 2021年9月24日

Stability analysis and control of decision-making of miners in blockchain

Arxiv

0+阅读 · 2021年9月24日

Coupled diffusion-reaction systems with ODE controlled interfaces: existence, uniqueness and an implicit-explicit finite element scheme

Arxiv

0+阅读 · 2021年9月23日

Optimal regularity of extended mean field controls and their piecewise constant approximation

Arxiv

0+阅读 · 2021年9月23日

Connectivity constrains quantum codes

Arxiv

0+阅读 · 2021年9月22日

Differential Dynamic Programming Neural Optimizer

Arxiv

7+阅读 · 2020年6月29日

Can Neural Machine Translation be Improved with User Feedback?

Arxiv

3+阅读 · 2018年4月16日

Not-So-CLEVR: Visual Relations Strain Feedforward Neural Networks

Arxiv

6+阅读 · 2018年2月12日

Safety-aware Adaptive Reinforcement Learning with Applications to Brushbot Navigation

Arxiv

4+阅读 · 2018年1月29日

微信扫码咨询专知VIP会员