Feynman-Kac 斯托卡优化控制中前向- 后退 SDEs 的值值函数模拟器 (Value Function Estimators for Feynman-Kac Forward-Backward SDEs in Stochastic Optimal Control) - 专知论文

会员服务 ·

0

估计/估计量 · 价值函数 · 优化器 · 模型评估 · 泛函 ·

2021 年 9 月 30 日

Value Function Estimators for Feynman-Kac Forward-Backward SDEs in Stochastic Optimal Control

翻译：Feynman-Kac 斯托卡优化控制中前向- 后退 SDEs 的值值函数模拟器

Kelsey P. Hawkins,Ali Pakniyat,Panagiotis Tsiotras

from arxiv, arXiv admin note: text overlap with arXiv:2006.12444

Two novel numerical estimators are proposed for solving forward-backward stochastic differential equations (FBSDEs) appearing in the Feynman-Kac representation of the value function in stochastic optimal control problems. In contrast to the current numerical approaches which are based on the discretization of the continuous-time FBSDE, we propose a converse approach, namely, we obtain a discrete-time approximation of the on-policy value function, and then we derive a discrete-time estimator that resembles the continuous-time counterpart. The proposed approach allows for the construction of higher accuracy estimators along with error analysis. The approach is applied to the policy improvement step in reinforcement learning. Numerical results and error analysis are demonstrated using (i) a scalar nonlinear stochastic optimal control problem and (ii) a four-dimensional linear quadratic regulator (LQR) problem. The proposed estimators show significant improvement in terms of accuracy in both cases over Euler-Maruyama-based estimators used in competing approaches. In the case of LQR problems, we demonstrate that our estimators result in near machine-precision level accuracy, in contrast to previously proposed methods that can potentially diverge on the same problems.

翻译：为了解决Feynman-Kac中出现的在随机最佳控制问题中价值函数代表的Feynman-Kac中出现的向后随机随机差异方程式(FBSDEs),提出了两个新的数字估计器。与目前基于连续时间FBSDE的离散数字方法相反,我们提议了一个反向方法,即我们获得政策值函数的离散时间近似值,然后我们得出一个类似于连续时间对应方的离散时间估计器。拟议方法允许在进行错误分析的同时构建更高的准确度估计器。该方法适用于加强学习的政策改进步骤。数字结果和误差分析是用以下方法进行的:(一) 星际非线性非线性随机最佳控制问题,和(二) 四维线性梯度调控管(LQR) 问题。拟议的估计器显示,在两种情况中,与Euler-Mariya基的测算器相比,精确度都有很大改进。在近Qrcisiro 级方法中,我们用不同的方法显示,我们之前的测算方法可能存在同样的问题。

0

相关内容

估计/估计量

估计/估计量

【NeurIPS2021】非凸从动件的基于梯度的双层优化

专知会员服务

13+阅读 · 2021年10月12日

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

53+阅读 · 2020年9月7日

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

专知会员服务

41+阅读 · 2020年4月11日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

IEEE | DSC 2019诚邀稿件 (EI检索)

IEEE | DSC 2019诚邀稿件 (EI检索)

Call4Papers

10+阅读 · 2019年2月25日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

时间序列算法ARIMA介绍

时间序列算法ARIMA介绍

凡人机器学习

5+阅读 · 2017年6月2日

Understanding the Impact of Data Distribution on Q-learning with Function Approximation

Arxiv

0+阅读 · 2021年11月23日

An $L^p$- Primal-Dual Weak Galerkin Method for Convection-Diffusion Equations

Arxiv

0+阅读 · 2021年11月22日

Offline Reinforcement Learning: Fundamental Barriers for Value Function Approximation

Arxiv

0+阅读 · 2021年11月21日

Gaussian Process Inference Using Mini-batch Stochastic Gradient Descent: Convergence Guarantees and Empirical Benefits

Arxiv

0+阅读 · 2021年11月19日

The Stochastic Boolean Function Evaluation Problem for Symmetric Boolean Functions

Arxiv

0+阅读 · 2021年11月19日

Safe Linear-Quadratic Dual Control with Almost Sure Performance Guarantee

Arxiv

0+阅读 · 2021年11月19日

Comparing Multi-Index Stochastic Collocation and Multi-Fidelity Stochastic Radial Basis Functions for Forward Uncertainty Quantification of Ship Resistance

Arxiv

0+阅读 · 2021年11月19日

Policy Gradient Bayesian Robust Optimization for Imitation Learning

Arxiv

5+阅读 · 2021年6月11日

Large-Scale Stochastic Sampling from the Probability Simplex

Arxiv

3+阅读 · 2018年6月19日

Optimal Algorithms for Non-Smooth Distributed Optimization in Networks

Arxiv

7+阅读 · 2018年6月1日

VIP会员

文章信息

相关主题

估计/估计量

相关VIP内容

【NeurIPS2021】非凸从动件的基于梯度的双层优化

专知会员服务

13+阅读 · 2021年10月12日

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

53+阅读 · 2020年9月7日

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

专知会员服务

41+阅读 · 2020年4月11日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

《战区安全决策课程体系》最新244页

《"无人机航母"原型平台》

任务规划与地形分析：现代复杂环境作战导航体系

《攻击场景描述形式化模型研究》

相关资讯

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

IEEE | DSC 2019诚邀稿件 (EI检索)

IEEE | DSC 2019诚邀稿件 (EI检索)

Call4Papers

10+阅读 · 2019年2月25日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

时间序列算法ARIMA介绍

时间序列算法ARIMA介绍

凡人机器学习

5+阅读 · 2017年6月2日

相关论文

Understanding the Impact of Data Distribution on Q-learning with Function Approximation

Arxiv

0+阅读 · 2021年11月23日

An $L^p$- Primal-Dual Weak Galerkin Method for Convection-Diffusion Equations

Arxiv

0+阅读 · 2021年11月22日

Offline Reinforcement Learning: Fundamental Barriers for Value Function Approximation

Arxiv

0+阅读 · 2021年11月21日

Gaussian Process Inference Using Mini-batch Stochastic Gradient Descent: Convergence Guarantees and Empirical Benefits

Arxiv

0+阅读 · 2021年11月19日

The Stochastic Boolean Function Evaluation Problem for Symmetric Boolean Functions

Arxiv

0+阅读 · 2021年11月19日

Safe Linear-Quadratic Dual Control with Almost Sure Performance Guarantee

Arxiv

0+阅读 · 2021年11月19日

Comparing Multi-Index Stochastic Collocation and Multi-Fidelity Stochastic Radial Basis Functions for Forward Uncertainty Quantification of Ship Resistance

Arxiv

0+阅读 · 2021年11月19日

Policy Gradient Bayesian Robust Optimization for Imitation Learning

Arxiv

5+阅读 · 2021年6月11日

Large-Scale Stochastic Sampling from the Probability Simplex

Arxiv

3+阅读 · 2018年6月19日

Optimal Algorithms for Non-Smooth Distributed Optimization in Networks

Arxiv

7+阅读 · 2018年6月1日

微信扫码咨询专知VIP会员