研究离散时间随机参数线性二次调节器的策略梯度方法 (Policy Gradient Methods for Discrete Time Linear Quadratic Regulator With Random Parameters) - 专知论文

会员服务 ·

0

策略梯度 · 离散 · 梯度 · 线性收敛性 · 优化控制 ·

2023 年 3 月 29 日

Policy Gradient Methods for Discrete Time Linear Quadratic Regulator With Random Parameters

翻译：研究离散时间随机参数线性二次调节器的策略梯度方法

from arxiv, 55 pages, 3 figures

This paper studies an infinite horizon optimal control problem for discrete-time linear system and quadratic criteria, both with random parameters which are independent and identically distributed with respect to time. In this general setting, we apply the policy gradient method, a reinforcement learning technique, to search for the optimal control without requiring knowledge of statistical information of the parameters. We investigate the sub-Gaussianity of the state process and establish global linear convergence guarantee for this approach based on assumptions that are weaker and easier to verify compared to existing results. Numerical experiments are presented to illustrate our result.

翻译：本文研究了离散时间线性系统和二次评估指标的无限时优化控制问题，两者都具有独立同分布于时间的随机参数。在这一普遍情况下，我们应用策略梯度方法，一种强化学习技术，搜索最优控制器而不需要知道参数的统计信息。我们研究状态过程的子高斯性，并在现有结果的假设较弱且更易验证的情况下建立了全局线性收敛性保证。我们呈现了数值实验来说明我们的结果。

0

相关内容

策略梯度

【2023新书】随机模型基础，815页pdf

【2023新书】随机模型基础，815页pdf

专知会员服务

104+阅读 · 2023年5月10日

【干货书】数据分析优化，Optimization for Modern Data Analysis，117页pdf

【干货书】数据分析优化，Optimization for Modern Data Analysis，117页pdf

专知会员服务

65+阅读 · 2023年2月15日

【香港中文大学(深圳)查宏远教授】最优传输与应用，Optimal Transport and Application

【香港中文大学(深圳)查宏远教授】最优传输与应用，Optimal Transport and Application

专知会员服务

17+阅读 · 2022年3月17日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【文本生成现代方法】Modern Methods for Text Generation

【文本生成现代方法】Modern Methods for Text Generation

专知会员服务

44+阅读 · 2020年9月11日

深度强化学习方法及其在经济学中的应用综述，Comprehensive Review of Deep Reinforcement Learning Methods and Applicationsin Economic

深度强化学习方法及其在经济学中的应用综述，Comprehensive Review of Deep Reinforcement Learning Methods and Applicationsin Economic

专知会员服务

52+阅读 · 2020年4月7日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

生成扩散模型漫谈：最优扩散方差估计（上）

生成扩散模型漫谈：最优扩散方差估计（上）

PaperWeekly

0+阅读 · 2022年9月25日

量化金融强化学习论文集合

量化金融强化学习论文集合

专知

14+阅读 · 2019年12月18日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

【论文推荐】最新六篇图像描述生成相关论文—视频摘要、注意力张量积、非自回归神经序列模型、副词识别、多主体、多样性度量

【论文推荐】最新六篇图像描述生成相关论文—视频摘要、注意力张量积、非自回归神经序列模型、副词识别、多主体、多样性度量

专知

10+阅读 · 2018年3月2日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

随机非线性系统基于哈密顿实现的分析、控制及应用研究

国家自然科学基金

0+阅读 · 2014年12月31日

非线性标量化及其在向量优化问题中的应用

国家自然科学基金

3+阅读 · 2013年12月31日

一类随机均衡约束优化问题的样本均值逼近-正则化方法及其在经济学模型中的应用

国家自然科学基金

0+阅读 · 2013年12月31日

基于H-表示的随机Markov跳跃系统的谱分析与H∞控制研究

国家自然科学基金

0+阅读 · 2013年12月31日

随机上三角非线性系统全局自适应控制

国家自然科学基金

0+阅读 · 2012年12月31日

半参数回归分析的随机函数法及其高维情形

国家自然科学基金

2+阅读 · 2012年12月31日

集值输出系统的随机辨识与适应控制

国家自然科学基金

0+阅读 · 2011年12月31日

随机变分不等式

国家自然科学基金

0+阅读 · 2011年12月31日

含未知参数随机跳变系统的自适应镇定性研究

国家自然科学基金

0+阅读 · 2009年12月31日

连续和离散时间随机系统的谱配置及其在H2/H∞控制中的应用

国家自然科学基金

0+阅读 · 2008年12月31日

Q-malizing flow and infinitesimal density ratio estimation

Arxiv

0+阅读 · 2023年5月19日

Linear estimators for Gaussian random variables in Hilbert spaces

Arxiv

0+阅读 · 2023年5月18日

Optimistic Natural Policy Gradient: a Simple Efficient Policy Optimization Framework for Online RL

Arxiv

0+阅读 · 2023年5月18日

Deep Metric Tensor Regularized Policy Gradient

Arxiv

0+阅读 · 2023年5月18日

Actor-Critic Methods using Physics-Informed Neural Networks: Control of a 1D PDE Model for Fluid-Cooled Battery Packs

Arxiv

0+阅读 · 2023年5月18日

Distillation Policy Optimization

Arxiv

0+阅读 · 2023年5月17日

Inference in parametric models with many L-moments

Arxiv

0+阅读 · 2023年5月16日

Reinforcement Learning for Safe Robot Control using Control Lyapunov Barrier Functions

Arxiv

0+阅读 · 2023年5月16日

A score-based operator Newton method for measure transport

Arxiv

0+阅读 · 2023年5月16日

Weak Limits for Empirical Entropic Optimal Transport: Beyond Smooth Costs

Arxiv

0+阅读 · 2023年5月16日

VIP会员

文章信息

相关主题

线性收敛性

相关VIP内容

【2023新书】随机模型基础，815页pdf

【2023新书】随机模型基础，815页pdf

专知会员服务

104+阅读 · 2023年5月10日

【干货书】数据分析优化，Optimization for Modern Data Analysis，117页pdf

【干货书】数据分析优化，Optimization for Modern Data Analysis，117页pdf

专知会员服务

65+阅读 · 2023年2月15日

【香港中文大学(深圳)查宏远教授】最优传输与应用，Optimal Transport and Application

【香港中文大学(深圳)查宏远教授】最优传输与应用，Optimal Transport and Application

专知会员服务

17+阅读 · 2022年3月17日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【文本生成现代方法】Modern Methods for Text Generation

【文本生成现代方法】Modern Methods for Text Generation

专知会员服务

44+阅读 · 2020年9月11日

深度强化学习方法及其在经济学中的应用综述，Comprehensive Review of Deep Reinforcement Learning Methods and Applicationsin Economic

深度强化学习方法及其在经济学中的应用综述，Comprehensive Review of Deep Reinforcement Learning Methods and Applicationsin Economic

专知会员服务

52+阅读 · 2020年4月7日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

乌克兰太空研究（2022-2024年） | 176页

新型军用战斗机无人机（MFUAV’s）| 2025最新80页

国防领域人工智能走向何方？

无人机对士兵的心理影响

相关资讯

生成扩散模型漫谈：最优扩散方差估计（上）

生成扩散模型漫谈：最优扩散方差估计（上）

PaperWeekly

0+阅读 · 2022年9月25日

量化金融强化学习论文集合

量化金融强化学习论文集合

专知

14+阅读 · 2019年12月18日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

【论文推荐】最新六篇图像描述生成相关论文—视频摘要、注意力张量积、非自回归神经序列模型、副词识别、多主体、多样性度量

【论文推荐】最新六篇图像描述生成相关论文—视频摘要、注意力张量积、非自回归神经序列模型、副词识别、多主体、多样性度量

专知

10+阅读 · 2018年3月2日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Q-malizing flow and infinitesimal density ratio estimation

Arxiv

0+阅读 · 2023年5月19日

Linear estimators for Gaussian random variables in Hilbert spaces

Arxiv

0+阅读 · 2023年5月18日

Optimistic Natural Policy Gradient: a Simple Efficient Policy Optimization Framework for Online RL

Arxiv

0+阅读 · 2023年5月18日

Deep Metric Tensor Regularized Policy Gradient

Arxiv

0+阅读 · 2023年5月18日

Actor-Critic Methods using Physics-Informed Neural Networks: Control of a 1D PDE Model for Fluid-Cooled Battery Packs

Arxiv

0+阅读 · 2023年5月18日

Distillation Policy Optimization

Arxiv

0+阅读 · 2023年5月17日

Inference in parametric models with many L-moments

Arxiv

0+阅读 · 2023年5月16日

Reinforcement Learning for Safe Robot Control using Control Lyapunov Barrier Functions

Arxiv

0+阅读 · 2023年5月16日

A score-based operator Newton method for measure transport

Arxiv

0+阅读 · 2023年5月16日

Weak Limits for Empirical Entropic Optimal Transport: Beyond Smooth Costs

Arxiv

0+阅读 · 2023年5月16日

相关基金

随机非线性系统基于哈密顿实现的分析、控制及应用研究

国家自然科学基金

0+阅读 · 2014年12月31日

非线性标量化及其在向量优化问题中的应用

国家自然科学基金

3+阅读 · 2013年12月31日

一类随机均衡约束优化问题的样本均值逼近-正则化方法及其在经济学模型中的应用

国家自然科学基金

0+阅读 · 2013年12月31日

基于H-表示的随机Markov跳跃系统的谱分析与H∞控制研究

国家自然科学基金

0+阅读 · 2013年12月31日

随机上三角非线性系统全局自适应控制

国家自然科学基金

0+阅读 · 2012年12月31日

半参数回归分析的随机函数法及其高维情形

国家自然科学基金

2+阅读 · 2012年12月31日

集值输出系统的随机辨识与适应控制

国家自然科学基金

0+阅读 · 2011年12月31日

随机变分不等式

国家自然科学基金

0+阅读 · 2011年12月31日

含未知参数随机跳变系统的自适应镇定性研究

国家自然科学基金

0+阅读 · 2009年12月31日

连续和离散时间随机系统的谱配置及其在H2/H∞控制中的应用

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员