将线性二次调节器中具有未知动态的低约束度的遗憾率匹配率 (Rate-matching the regret lower-bound in the linear quadratic regulator with unknown dynamics) - 专知论文

会员服务 ·

0

估计误差 · 估计/估计量 · 线性的 · Gram 矩阵 · 代价函数 ·

2022 年 2 月 11 日

Rate-matching the regret lower-bound in the linear quadratic regulator with unknown dynamics

翻译：将线性二次调节器中具有未知动态的低约束度的遗憾率匹配率

Feicheng Wang,Lucas Janson

The theory of reinforcement learning currently suffers from a mismatch between its empirical performance and the theoretical characterization of its performance, with consequences for, e.g., the understanding of sample efficiency, safety, and robustness. The linear quadratic regulator with unknown dynamics is a fundamental reinforcement learning setting with significant structure in its dynamics and cost function, yet even in this setting there is a gap between the best known regret lower-bound of $\Omega_p(\sqrt{T})$ and the best known upper-bound of $O_p(\sqrt{T}\,\text{polylog}(T))$. The contribution of this paper is to close that gap by establishing a novel regret upper-bound of $O_p(\sqrt{T})$. Our proof is constructive in that it analyzes the regret of a concrete algorithm, and simultaneously establishes an estimation error bound on the dynamics of $O_p(T^{-1/4})$ which is also the first to match the rate of a known lower-bound. The two keys to our improved proof technique are (1) a more precise upper- and lower-bound on the system Gram matrix and (2) a self-bounding argument for the expected estimation error of the optimal controller.

翻译：强化学习理论目前因经验性业绩与对业绩的理论定性不匹配而存在脱节,例如,对抽样效率、安全和稳健性的理解。具有未知动态的线形四级监管机构是一个基本的强化学习基础,其动态和成本功能的结构相当,具有重要的结构,但即使在这一背景下,最著名的低级遗憾($Omega_p(sqrt{T})$)与最著名的高限($O_p(sqrt{T}))美元)之间也存在差距。本文的贡献是缩小这一差距,建立一个新的遗憾上限为$O_p(sqrt{T})。我们的证据很有说服力,因为它分析了具体算法的遗憾,同时确定了对美元($_p(T ⁇ -1/4})的动态的估计错误,这也是第一个与已知的低限率相匹配的。我们改进证据技术的两个关键是:(1) 一个更精确的上层和下层的自我定位矩阵,这是一个更精确的、更低层的模型。

0

相关内容

估计误差

【NUS-Xavier教授】生成模型VAE与GAN，69页ppt

【NUS-Xavier教授】生成模型VAE与GAN，69页ppt

专知会员服务

74+阅读 · 2022年4月6日

【NUS-Xavier教授】注意力神经网络，79页ppt

【NUS-Xavier教授】注意力神经网络，79页ppt

专知会员服务

65+阅读 · 2021年11月25日

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

【经典书】数据挖掘：理论、算法与示例，347页pdf，Nong Ye，Arizona State University

【经典书】数据挖掘：理论、算法与示例，347页pdf，Nong Ye，Arizona State University

专知会员服务

82+阅读 · 2020年2月27日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

F-actin结合蛋白在维甲酸诱导的舌肌发育不良中的作用及机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

TMS1基因响应高温胁迫和ER Stress的分子机制

国家自然科学基金

0+阅读 · 2014年12月31日

极大似然minwise哈希估计子研究

国家自然科学基金

0+阅读 · 2013年12月31日

无线通信系统压缩采样定时同步机制研究

国家自然科学基金

1+阅读 · 2012年12月31日

酵母中全转录组RNA和蛋白质组结合的高通量分析及研究

国家自然科学基金

0+阅读 · 2012年12月31日

楸树嫩枝扦插不定根发育的转录组分析

国家自然科学基金

0+阅读 · 2012年12月31日

颗粒流体力学方法：一种流体力学的颗粒离散单元方法

国家自然科学基金

0+阅读 · 2012年12月31日

基于压缩感知的OFDM系统的PAPR减小和切削噪声消除

国家自然科学基金

0+阅读 · 2012年12月31日

Multi-Agent架构智能机器人推理机实时性研究

国家自然科学基金

1+阅读 · 2011年12月31日

Unscented卡尔曼滤波算法及其在通信中的应用

国家自然科学基金

0+阅读 · 2008年12月31日

Fast computation of optimal damping parameters for linear vibrational systems

Arxiv

0+阅读 · 2022年4月19日

What is the optimal schedule for the UEFA Champions League groups?

Arxiv

0+阅读 · 2022年4月19日

Covariance Estimation for Matrix-valued Data

Arxiv

0+阅读 · 2022年4月18日

Low Degree Testing over the Reals

Arxiv

0+阅读 · 2022年4月18日

An Efficient Approximation Algorithm for the Colonel Blotto Game

Arxiv

0+阅读 · 2022年4月16日

Improved Convergence Rate of Stochastic Gradient Langevin Dynamics with Variance Reduction and its Application to Optimization

Arxiv

0+阅读 · 2022年4月16日

Optimal Estimation of Off-Policy Policy Gradient via Double Fitted Iteration

Arxiv

0+阅读 · 2022年4月15日

Sublinear Time Spectral Density Estimation

Arxiv

0+阅读 · 2022年4月14日

Information in probability: Another information-theoretic proof of a finite de Finetti theorem

Arxiv

0+阅读 · 2022年4月14日

The Causal Learning of Retail Delinquency

Arxiv

14+阅读 · 2020年12月17日

VIP会员

文章信息

相关主题

估计/估计量

相关VIP内容

【NUS-Xavier教授】生成模型VAE与GAN，69页ppt

【NUS-Xavier教授】生成模型VAE与GAN，69页ppt

专知会员服务

74+阅读 · 2022年4月6日

【NUS-Xavier教授】注意力神经网络，79页ppt

【NUS-Xavier教授】注意力神经网络，79页ppt

专知会员服务

65+阅读 · 2021年11月25日

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

【经典书】数据挖掘：理论、算法与示例，347页pdf，Nong Ye，Arizona State University

【经典书】数据挖掘：理论、算法与示例，347页pdf，Nong Ye，Arizona State University

专知会员服务

82+阅读 · 2020年2月27日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【CMU博士论文】以人为中心的强化学习

任务规划与地形分析：现代复杂环境作战导航体系

认知优势：人工智能在国家安全决策中的核心作用

大模型赋能的具身智能：决策与具身学习综述

相关资讯

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

相关论文

Fast computation of optimal damping parameters for linear vibrational systems

Arxiv

0+阅读 · 2022年4月19日

What is the optimal schedule for the UEFA Champions League groups?

Arxiv

0+阅读 · 2022年4月19日

Covariance Estimation for Matrix-valued Data

Arxiv

0+阅读 · 2022年4月18日

Low Degree Testing over the Reals

Arxiv

0+阅读 · 2022年4月18日

An Efficient Approximation Algorithm for the Colonel Blotto Game

Arxiv

0+阅读 · 2022年4月16日

Improved Convergence Rate of Stochastic Gradient Langevin Dynamics with Variance Reduction and its Application to Optimization

Arxiv

0+阅读 · 2022年4月16日

Optimal Estimation of Off-Policy Policy Gradient via Double Fitted Iteration

Arxiv

0+阅读 · 2022年4月15日

Sublinear Time Spectral Density Estimation

Arxiv

0+阅读 · 2022年4月14日

Information in probability: Another information-theoretic proof of a finite de Finetti theorem

Arxiv

0+阅读 · 2022年4月14日

The Causal Learning of Retail Delinquency

Arxiv

14+阅读 · 2020年12月17日

相关基金

F-actin结合蛋白在维甲酸诱导的舌肌发育不良中的作用及机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

TMS1基因响应高温胁迫和ER Stress的分子机制

国家自然科学基金

0+阅读 · 2014年12月31日

极大似然minwise哈希估计子研究

国家自然科学基金

0+阅读 · 2013年12月31日

无线通信系统压缩采样定时同步机制研究

国家自然科学基金

1+阅读 · 2012年12月31日

酵母中全转录组RNA和蛋白质组结合的高通量分析及研究

国家自然科学基金

0+阅读 · 2012年12月31日

楸树嫩枝扦插不定根发育的转录组分析

国家自然科学基金

0+阅读 · 2012年12月31日

颗粒流体力学方法：一种流体力学的颗粒离散单元方法

国家自然科学基金

0+阅读 · 2012年12月31日

基于压缩感知的OFDM系统的PAPR减小和切削噪声消除

国家自然科学基金

0+阅读 · 2012年12月31日

Multi-Agent架构智能机器人推理机实时性研究

国家自然科学基金

1+阅读 · 2011年12月31日

Unscented卡尔曼滤波算法及其在通信中的应用

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员