增强的RBMLE-UCB方法在线性二次系统自适应控制中的应用 (Augmented RBMLE-UCB Approach for Adaptive Control of Linear Quadratic Systems) - 专知论文

会员服务 ·

0

控制问题 · 适应控制 · 等效 · 自适应控制 · 扰动 ·

2023 年 3 月 24 日

Augmented RBMLE-UCB Approach for Adaptive Control of Linear Quadratic Systems

翻译：增强的RBMLE-UCB方法在线性二次系统自适应控制中的应用

Akshay Mete,Rahul Singh,P. R. Kumar

from arxiv, 36th Conference on Neural Information Processing Systems (NeurIPS 2022). https://openreview.net/forum?id=7pNV4PCjbQy

We consider the problem of controlling an unknown stochastic linear system with quadratic costs - called the adaptive LQ control problem. We re-examine an approach called ''Reward Biased Maximum Likelihood Estimate'' (RBMLE) that was proposed more than forty years ago, and which predates the ''Upper Confidence Bound'' (UCB) method as well as the definition of ''regret'' for bandit problems. It simply added a term favoring parameters with larger rewards to the criterion for parameter estimation. We show how the RBMLE and UCB methods can be reconciled, and thereby propose an Augmented RBMLE-UCB algorithm that combines the penalty of the RBMLE method with the constraints of the UCB method, uniting the two approaches to optimism in the face of uncertainty. We establish that theoretically, this method retains $\Tilde{\mathcal{O}}(\sqrt{T})$ regret, the best-known so far. We further compare the empirical performance of the proposed Augmented RBMLE-UCB and the standard RBMLE (without the augmentation) with UCB, Thompson Sampling, Input Perturbation, Randomized Certainty Equivalence and StabL on many real-world examples including flight control of Boeing 747 and Unmanned Aerial Vehicle. We perform extensive simulation studies showing that the Augmented RBMLE consistently outperforms UCB, Thompson Sampling and StabL by a huge margin, while it is marginally better than Input Perturbation and moderately better than Randomized Certainty Equivalence.

翻译：本文考虑未知随机线性二次系统的控制问题，称为自适应LQ控制问题。我们重新审视了一种称为“奖励偏置最大似然估计”（RBMLE）的方法，该方法提出于四十多年前，早于“上置信边界”(UCB)方法和“遗憾”（regret）模型在赌博问题中的定义。它简单地在参数估计的目标函数中增加了一个偏向于更高奖励参数的项。本文介绍了如何将RBMLE和UCB方法相结合，提出了一种增强的RBMLE-UCB算法，该算法将RBMLE方法的惩罚与UCB方法的约束相结合，统一了两种面对不确定性的乐观方法。本文理论证明了，该方法保持了迄今为止已知的最佳$\Tilde{\mathcal{O}}(\sqrt{T})$的遗憾（regret）。本文在查尔斯斯塔克电视熊（Charles Stark Draper Laboratory）的波音747和无人机等许多真实世界的案例中，将提出的增强的RBMLE-UCB方法和标准的RBMLE（没有增强）与UCB、Thompson采样、输入扰动、随机化置信等效和稳定性进行了比较。本文进行了大量的模拟研究，结果表明增强的RBMLE方法在多个案例中总是优于UCB、Thompson采样和稳定性，而在输入扰动方面略微优于增强的RBMLE方法，在随机化置信等效方面中等优于增强的RBMLE方法。

0

相关内容

控制问题

生成式对抗网络异常检测，GANs for Anomaly Detection

专知会员服务

34+阅读 · 2021年9月16日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【NeuraIPS2020-谷歌】用于鲁棒性和不确定性量化的超参数集成

【NeuraIPS2020-谷歌】用于鲁棒性和不确定性量化的超参数集成

专知会员服务

13+阅读 · 2020年10月27日

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

专知会员服务

74+阅读 · 2020年8月2日

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

专知会员服务

19+阅读 · 2020年6月29日

【CVPR2020】视觉跟踪的概率回归，Probabilistic Regression for Visual Tracking

【CVPR2020】视觉跟踪的概率回归，Probabilistic Regression for Visual Tracking

专知会员服务

37+阅读 · 2020年3月27日

【MIT】对抗鲁棒性的流形正则化，Manifold Regularization for Adversarial Robustness

【MIT】对抗鲁棒性的流形正则化，Manifold Regularization for Adversarial Robustness

专知会员服务

28+阅读 · 2020年3月11日

最大均方差正则化贝叶斯神经网络，Bayesian Neural Networks With Maximum Mean Discrepancy Regularization

最大均方差正则化贝叶斯神经网络，Bayesian Neural Networks With Maximum Mean Discrepancy Regularization

专知会员服务

54+阅读 · 2020年3月5日

【Google-普林斯顿】从学习速率中解开自适应梯度法，Disentangling Adaptive Gradient

专知会员服务

19+阅读 · 2020年3月5日

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

专知会员服务

77+阅读 · 2020年2月8日

Ada Workshop @ ESEC/FSE 2022，邀妳线上参与国际顶会！

Ada Workshop @ ESEC/FSE 2022，邀妳线上参与国际顶会！

微软研究院AI头条

1+阅读 · 2022年11月10日

【IJCAI2022教程】对话推荐系统，88页ppt，Conversational Recommender Systems

【IJCAI2022教程】对话推荐系统，88页ppt，Conversational Recommender Systems

专知

2+阅读 · 2022年7月28日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新六篇序列推荐相关论文—卷积序列嵌入学习、用户记忆网络、上下文GRU、迁移学习

【论文推荐】最新六篇序列推荐相关论文—卷积序列嵌入学习、用户记忆网络、上下文GRU、迁移学习

专知

10+阅读 · 2018年4月8日

【论文推荐】最新七篇知识图谱相关论文—知识表示学习、增强神经网络、链接预测、关系预测与提取、综述、递归特性生成、深度知识感知网络

【论文推荐】最新七篇知识图谱相关论文—知识表示学习、增强神经网络、链接预测、关系预测与提取、综述、递归特性生成、深度知识感知网络

专知

29+阅读 · 2018年3月6日

【推荐】YOLO实时目标检测(6fps)

【推荐】YOLO实时目标检测(6fps)

机器学习研究会

20+阅读 · 2017年11月5日

模糊双线性跳变系统的多目标控制研究

国家自然科学基金

0+阅读 · 2013年12月31日

几类控制系数未知非线性系统的输出反馈跟踪控制研究

国家自然科学基金

1+阅读 · 2013年12月31日

面向智能电网的广域阻尼控制(WADC)研究

国家自然科学基金

0+阅读 · 2013年12月31日

执行器饱和多时滞系统的控制综合及抗饱和设计

国家自然科学基金

0+阅读 · 2013年12月31日

具有Markov跳变参数的随机混合拟哈密顿系统的动力学与控制

国家自然科学基金

0+阅读 · 2012年12月31日

广义系统下含不稳定和非正则加权函数的奇异控制问题

国家自然科学基金

0+阅读 · 2012年12月31日

Volterra积分微分方程高效谱配置方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

超洛伦兹-高斯光束的构建及其用于表征大角度激光束的研究

国家自然科学基金

0+阅读 · 2009年12月31日

随机微分方程的逼近

国家自然科学基金

0+阅读 · 2009年12月31日

分布参数系统逻辑切换自适应控制及应用

国家自然科学基金

0+阅读 · 2009年12月31日

A multilinear HJB-POD method for the optimal control of PDEs

A multilinear HJB-POD method for the optimal control of PDEs

Arxiv

0+阅读 · 2023年5月15日

Accelerated Algorithms for Nonlinear Matrix Decomposition with the ReLU function

Arxiv

0+阅读 · 2023年5月15日

Fast Online Algorithms for Linear Programming

Arxiv

0+阅读 · 2023年5月13日

Sequential model correction for nonlinear inverse problems

Arxiv

0+阅读 · 2023年5月12日

Policy Gradient Algorithms Implicitly Optimize by Continuation

Arxiv

0+阅读 · 2023年5月11日

Manifold Regularized Tucker Decomposition Approach for Spatiotemporal Traffic Data Imputation

Arxiv

0+阅读 · 2023年5月11日

Robust Privacy-Preserving Models for Cluster-Level Confounding: Recognizing Disparities in Access to Transplantation

Arxiv

0+阅读 · 2023年5月10日

DNN Verification, Reachability, and the Exponential Function Problem

Arxiv

0+阅读 · 2023年5月10日

$FM^2$: Field-matrixed Factorization Machines for Recommender Systems

Arxiv

16+阅读 · 2021年2月20日

A Survey on Knowledge Graph-Based Recommender Systems

Arxiv

92+阅读 · 2020年2月28日

VIP会员

文章信息

相关主题

自适应控制

相关VIP内容

生成式对抗网络异常检测，GANs for Anomaly Detection

专知会员服务

34+阅读 · 2021年9月16日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【NeuraIPS2020-谷歌】用于鲁棒性和不确定性量化的超参数集成

【NeuraIPS2020-谷歌】用于鲁棒性和不确定性量化的超参数集成

专知会员服务

13+阅读 · 2020年10月27日

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

专知会员服务

74+阅读 · 2020年8月2日

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

专知会员服务

19+阅读 · 2020年6月29日

【CVPR2020】视觉跟踪的概率回归，Probabilistic Regression for Visual Tracking

【CVPR2020】视觉跟踪的概率回归，Probabilistic Regression for Visual Tracking

专知会员服务

37+阅读 · 2020年3月27日

【MIT】对抗鲁棒性的流形正则化，Manifold Regularization for Adversarial Robustness

【MIT】对抗鲁棒性的流形正则化，Manifold Regularization for Adversarial Robustness

专知会员服务

28+阅读 · 2020年3月11日

最大均方差正则化贝叶斯神经网络，Bayesian Neural Networks With Maximum Mean Discrepancy Regularization

最大均方差正则化贝叶斯神经网络，Bayesian Neural Networks With Maximum Mean Discrepancy Regularization

专知会员服务

54+阅读 · 2020年3月5日

【Google-普林斯顿】从学习速率中解开自适应梯度法，Disentangling Adaptive Gradient

专知会员服务

19+阅读 · 2020年3月5日

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

专知会员服务

77+阅读 · 2020年2月8日

热门VIP内容

开通专知VIP会员享更多权益服务

《物联网（IoT）中的无人机通信高效控制》135页

《在GNSS信号降级环境中利用共识实现无人机集群稳健协调》

中程单向攻击无人机的战略意义：俄乌战争启示

《面向无人机集群的避障动态传感器覆盖算法》最新38页

相关资讯

Ada Workshop @ ESEC/FSE 2022，邀妳线上参与国际顶会！

Ada Workshop @ ESEC/FSE 2022，邀妳线上参与国际顶会！

微软研究院AI头条

1+阅读 · 2022年11月10日

【IJCAI2022教程】对话推荐系统，88页ppt，Conversational Recommender Systems

【IJCAI2022教程】对话推荐系统，88页ppt，Conversational Recommender Systems

专知

2+阅读 · 2022年7月28日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新六篇序列推荐相关论文—卷积序列嵌入学习、用户记忆网络、上下文GRU、迁移学习

【论文推荐】最新六篇序列推荐相关论文—卷积序列嵌入学习、用户记忆网络、上下文GRU、迁移学习

专知

10+阅读 · 2018年4月8日

【论文推荐】最新七篇知识图谱相关论文—知识表示学习、增强神经网络、链接预测、关系预测与提取、综述、递归特性生成、深度知识感知网络

【论文推荐】最新七篇知识图谱相关论文—知识表示学习、增强神经网络、链接预测、关系预测与提取、综述、递归特性生成、深度知识感知网络

专知

29+阅读 · 2018年3月6日

【推荐】YOLO实时目标检测(6fps)

【推荐】YOLO实时目标检测(6fps)

机器学习研究会

20+阅读 · 2017年11月5日

相关论文

A multilinear HJB-POD method for the optimal control of PDEs

A multilinear HJB-POD method for the optimal control of PDEs

Arxiv

0+阅读 · 2023年5月15日

Accelerated Algorithms for Nonlinear Matrix Decomposition with the ReLU function

Arxiv

0+阅读 · 2023年5月15日

Fast Online Algorithms for Linear Programming

Arxiv

0+阅读 · 2023年5月13日

Sequential model correction for nonlinear inverse problems

Arxiv

0+阅读 · 2023年5月12日

Policy Gradient Algorithms Implicitly Optimize by Continuation

Arxiv

0+阅读 · 2023年5月11日

Manifold Regularized Tucker Decomposition Approach for Spatiotemporal Traffic Data Imputation

Arxiv

0+阅读 · 2023年5月11日

Robust Privacy-Preserving Models for Cluster-Level Confounding: Recognizing Disparities in Access to Transplantation

Arxiv

0+阅读 · 2023年5月10日

DNN Verification, Reachability, and the Exponential Function Problem

Arxiv

0+阅读 · 2023年5月10日

$FM^2$: Field-matrixed Factorization Machines for Recommender Systems

Arxiv

16+阅读 · 2021年2月20日

A Survey on Knowledge Graph-Based Recommender Systems

Arxiv

92+阅读 · 2020年2月28日

相关基金

模糊双线性跳变系统的多目标控制研究

国家自然科学基金

0+阅读 · 2013年12月31日

几类控制系数未知非线性系统的输出反馈跟踪控制研究

国家自然科学基金

1+阅读 · 2013年12月31日

面向智能电网的广域阻尼控制(WADC)研究

国家自然科学基金

0+阅读 · 2013年12月31日

执行器饱和多时滞系统的控制综合及抗饱和设计

国家自然科学基金

0+阅读 · 2013年12月31日

具有Markov跳变参数的随机混合拟哈密顿系统的动力学与控制

国家自然科学基金

0+阅读 · 2012年12月31日

广义系统下含不稳定和非正则加权函数的奇异控制问题

国家自然科学基金

0+阅读 · 2012年12月31日

Volterra积分微分方程高效谱配置方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

超洛伦兹-高斯光束的构建及其用于表征大角度激光束的研究

国家自然科学基金

0+阅读 · 2009年12月31日

随机微分方程的逼近

国家自然科学基金

0+阅读 · 2009年12月31日

分布参数系统逻辑切换自适应控制及应用

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员