用于规范的存储控制问题的渐变流 (Gradient Flows for Regularized Stochastic Control Problems) - 专知论文

会员服务 ·

0

控制器 · 正则化项 · 优化器 · 可辨认的 · 泛函 ·

2020 年 12 月 7 日

Gradient Flows for Regularized Stochastic Control Problems

翻译：用于规范的存储控制问题的渐变流

David Šiška,Łukasz Szpruch

This paper studies stochastic control problems regularized by the relative entropy, where the action space is the space of measures. This setting includes relaxed control problems, problems of finding Markovian controls with the control function replaced by an idealized infinitely wide neural network and can be extended to the search for causal optimal transport maps. By exploiting the Pontryagin optimality principle, we identify suitable metric space on which we construct gradient flow for the measure-valued control process along which the cost functional is guaranteed to decrease. It is shown that under appropriate conditions, this gradient flow has an invariant measure which is the optimal control for the regularized stochastic control problem. If the problem we work with is sufficiently convex, the gradient flow converges exponentially fast. Furthermore, the optimal measured valued control admits Bayesian interpretation which means that one can incorporate prior knowledge when solving stochastic control problem. This work is motivated by a desire to extend the theoretical underpinning for the convergence of stochastic gradient type algorithms widely used in the reinforcement learning community to solve control problems.

翻译：本文研究相对的 entropy 所规范的随机控制问题, 即动作空间是测量空间的空间。这一设置包括宽松的控制问题, 找到Markovian 控制的问题, 其控制功能被一个理想化的无限宽度神经网络所取代, 并可以扩大到寻找因果最佳运输图。通过利用 Pontryagin 最佳性原则, 我们确定适当的测量空间, 用以构建测量值控制过程的梯度流, 从而保证降低成本功能。事实显示, 在适当条件下, 这种梯度流具有一种变化性措施, 这是常规切换控制问题的最佳控制方法。如果我们处理的问题有足够的convex, 梯度流会迅速汇合。此外, 最佳的测量值控制会接受巴耶斯人的解释, 这意味着在解决随机控制问题时, 可以包含先前的知识。这项工作的动机是希望扩大理论基础, 以便整合在强化学习社区中广泛使用的随机梯度梯度型算法, 以解决控制问题。

0

相关内容

控制器

MIT科学家Dimitri P. Bertsekas最新《强化学习与最优控制》2021ASU课程，(附书稿PDF&讲义)

MIT科学家Dimitri P. Bertsekas最新《强化学习与最优控制》2021ASU课程，(附书稿PDF&讲义)

专知会员服务

92+阅读 · 2021年1月17日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【经典书】应用随机微分方程，324页pdf，Applied Stochastic Differential Equations

【经典书】应用随机微分方程，324页pdf，Applied Stochastic Differential Equations

专知会员服务

58+阅读 · 2020年11月21日

数据科学导论，54页ppt，Introduction to Data Science

数据科学导论，54页ppt，Introduction to Data Science

专知会员服务

42+阅读 · 2020年7月27日

Python计算导论，560页pdf，Introduction to Computing Using Python

Python计算导论，560页pdf，Introduction to Computing Using Python

专知会员服务

76+阅读 · 2020年5月5日

【Google-普林斯顿】从学习速率中解开自适应梯度法，Disentangling Adaptive Gradient

专知会员服务

19+阅读 · 2020年3月5日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Numerical approximation and simulation of the stochastic wave equation on the sphere

Arxiv

0+阅读 · 2021年2月8日

Dynamic Sasvi: Strong Safe Screening for Norm-Regularized Least Squares

Arxiv

0+阅读 · 2021年2月8日

Non-stationary Online Learning with Memory and Non-stochastic Control

Arxiv

0+阅读 · 2021年2月7日

Primal dual methods for Wasserstein gradient flows

Arxiv

0+阅读 · 2021年2月7日

Robust discretization and solvers for elliptic optimal control problems with energy regularization

Arxiv

0+阅读 · 2021年2月6日

The Maximum Exposure Problem

Arxiv

0+阅读 · 2021年2月6日

Almost sure convergence rates for Stochastic Gradient Descent and Stochastic Heavy Ball

Arxiv

0+阅读 · 2021年2月5日

The Double Exponential Runtime is Tight for 2-Stage Stochastic ILPs

Arxiv

0+阅读 · 2021年2月5日

Parabolic optimal control with strongly monotone quasilinearity and its time discretization

Arxiv

0+阅读 · 2021年2月4日

(FPT-)Approximation Algorithms for the Virtual Network Embedding Problem

Arxiv

4+阅读 · 2018年3月12日

VIP会员

文章信息

相关主题

相关VIP内容

MIT科学家Dimitri P. Bertsekas最新《强化学习与最优控制》2021ASU课程，(附书稿PDF&讲义)

MIT科学家Dimitri P. Bertsekas最新《强化学习与最优控制》2021ASU课程，(附书稿PDF&讲义)

专知会员服务

92+阅读 · 2021年1月17日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【经典书】应用随机微分方程，324页pdf，Applied Stochastic Differential Equations

【经典书】应用随机微分方程，324页pdf，Applied Stochastic Differential Equations

专知会员服务

58+阅读 · 2020年11月21日

数据科学导论，54页ppt，Introduction to Data Science

数据科学导论，54页ppt，Introduction to Data Science

专知会员服务

42+阅读 · 2020年7月27日

Python计算导论，560页pdf，Introduction to Computing Using Python

Python计算导论，560页pdf，Introduction to Computing Using Python

专知会员服务

76+阅读 · 2020年5月5日

【Google-普林斯顿】从学习速率中解开自适应梯度法，Disentangling Adaptive Gradient

专知会员服务

19+阅读 · 2020年3月5日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

乌克兰太空研究（2022-2024年） | 176页

新型军用战斗机无人机（MFUAV’s）| 2025最新80页

国防领域人工智能走向何方？

无人机对士兵的心理影响

相关资讯

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

相关论文

Numerical approximation and simulation of the stochastic wave equation on the sphere

Arxiv

0+阅读 · 2021年2月8日

Dynamic Sasvi: Strong Safe Screening for Norm-Regularized Least Squares

Arxiv

0+阅读 · 2021年2月8日

Non-stationary Online Learning with Memory and Non-stochastic Control

Arxiv

0+阅读 · 2021年2月7日

Primal dual methods for Wasserstein gradient flows

Arxiv

0+阅读 · 2021年2月7日

Robust discretization and solvers for elliptic optimal control problems with energy regularization

Arxiv

0+阅读 · 2021年2月6日

The Maximum Exposure Problem

Arxiv

0+阅读 · 2021年2月6日

Almost sure convergence rates for Stochastic Gradient Descent and Stochastic Heavy Ball

Arxiv

0+阅读 · 2021年2月5日

The Double Exponential Runtime is Tight for 2-Stage Stochastic ILPs

Arxiv

0+阅读 · 2021年2月5日

Parabolic optimal control with strongly monotone quasilinearity and its time discretization

Arxiv

0+阅读 · 2021年2月4日

(FPT-)Approximation Algorithms for the Virtual Network Embedding Problem

Arxiv

4+阅读 · 2018年3月12日

微信扫码咨询专知VIP会员