学习控制带有有限内存的部分观察系统 (Learning to Control Partially Observed Systems with Finite Memory) - 专知论文

会员服务 ·

0

控制器 · 部分可观测马尔可夫决策过程 · 学成 · 近似 · 马尔可夫链 ·

2022 年 2 月 20 日

Learning to Control Partially Observed Systems with Finite Memory

翻译：学习控制带有有限内存的部分观察系统

Semih Cayci,Niao He,R. Srikant

We consider the reinforcement learning problem for partially observed Markov decision processes (POMDPs) with large or even countably infinite state spaces, where the controller has access to only noisy observations of the underlying controlled Markov chain. We consider a natural actor-critic method that employs a finite internal memory for policy parameterization, and a multi-step temporal difference learning algorithm for policy evaluation. We establish, to the best of our knowledge, the first non-asymptotic global convergence of actor-critic methods for partially observed systems under function approximation. In particular, in addition to the function approximation and statistical errors that also arise in MDPs, we explicitly characterize the error due to the use of finite-state controllers. This additional error is stated in terms of the total variation distance between the traditional belief state in POMDPs and the posterior distribution of the hidden state when using a finite-state controller. Further, we show that this error can be made small in the case of sliding-block controllers by using larger block sizes.

翻译：我们考虑了部分观测到的马尔科夫决策流程(POMDPs)的强化学习问题,该流程中,控制员只能对受控的马科夫链进行噪音观测。我们考虑的是使用有限的内部内存来进行政策参数化的自然行为者-批评方法,以及用于政策评估的多步时间差异学习算法。我们据我们所知,为功能近似下部分观测的系统建立了第一个非被动的行为体-批评方法全球趋同。特别是,除了功能近似和统计错误外,我们明确了由于使用固定状态控制器而产生的错误。这一额外错误表现为在使用限定状态控制器时,POMDPs的传统信仰状态与隐藏状态的后方分布之间的全面差异距离。此外,我们用较大块尺寸的滑动控制器可以使这一错误变得小。

0

相关内容

控制器

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【开放书】部分观测动态系统的贝叶斯学习，119页pdf，Bayesian Learning for partially observed dynamical systems

【开放书】部分观测动态系统的贝叶斯学习，119页pdf，Bayesian Learning for partially observed dynamical systems

专知会员服务

41+阅读 · 2019年12月27日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

244+阅读 · 2019年10月21日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

复杂市场环境下多阶段不等面积设施动态布局优化研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于时空域模型分解策略的流程企业级协同优化方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于视觉注意机制的SAR图像小目标检测方法研究

国家自然科学基金

4+阅读 · 2013年12月31日

电子商务环境下基于智能优化算法的订单调度问题的研究

国家自然科学基金

0+阅读 · 2013年12月31日

非精确点集的计算几何优化算法研究

国家自然科学基金

0+阅读 · 2012年12月31日

β-Sarcoglycan在mSOD1介导ALS骨骼肌病变中的机制研究

国家自然科学基金

1+阅读 · 2012年12月31日

约束Markov过程的大偏差与拟遍历性及相关问题

国家自然科学基金

0+阅读 · 2012年12月31日

针对环境变量不确定性的进化鲁棒优化算法

国家自然科学基金

0+阅读 · 2011年12月31日

基于神经-体液调控机制的有机制造系统自适应控制研究

国家自然科学基金

0+阅读 · 2011年12月31日

癌症相关受体EGFR、Fas、ER和AR与钙调素相互作用的晶体结构研究

国家自然科学基金

1+阅读 · 2009年12月31日

Memory-Constrained Policy Optimization

Arxiv

0+阅读 · 2022年4月20日

State machines for large scale computer software and systems

Arxiv

0+阅读 · 2022年4月19日

When Is Partially Observable Reinforcement Learning Not Scary?

Arxiv

0+阅读 · 2022年4月19日

Optimal bounds for numerical approximations of infinite horizon problems based on dynamic programming approach

Arxiv

1+阅读 · 2022年4月19日

Expert-Calibrated Learning for Online Optimization with Switching Costs

Arxiv

0+阅读 · 2022年4月18日

Configuration-Aware Safe Control for Mobile Robotic Arm with Control Barrier Functions

Configuration-Aware Safe Control for Mobile Robotic Arm with Control Barrier Functions

Arxiv

1+阅读 · 2022年4月18日

Methodical Advice Collection and Reuse in Deep Reinforcement Learning

Arxiv

1+阅读 · 2022年4月14日

Testing distributional assumptions of learning algorithms

Arxiv

0+阅读 · 2022年4月14日

Adaptive Synthetic Characters for Military Training

Adaptive Synthetic Characters for Military Training

Arxiv

49+阅读 · 2021年1月6日

The Causal Learning of Retail Delinquency

Arxiv

14+阅读 · 2020年12月17日

VIP会员

文章信息

相关主题

部分可观测马尔可夫决策过程

马尔可夫链

相关VIP内容

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【开放书】部分观测动态系统的贝叶斯学习，119页pdf，Bayesian Learning for partially observed dynamical systems

【开放书】部分观测动态系统的贝叶斯学习，119页pdf，Bayesian Learning for partially observed dynamical systems

专知会员服务

41+阅读 · 2019年12月27日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

244+阅读 · 2019年10月21日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

新书册《几何深度学习的数学基础》

中程单向攻击无人机的战略意义：俄乌战争启示

在无标注条件下适配视觉—语言模型：全面综述

面向视觉语言模型的持续学习：遗忘之外的综述与分类体系

相关资讯

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Memory-Constrained Policy Optimization

Arxiv

0+阅读 · 2022年4月20日

State machines for large scale computer software and systems

Arxiv

0+阅读 · 2022年4月19日

When Is Partially Observable Reinforcement Learning Not Scary?

Arxiv

0+阅读 · 2022年4月19日

Optimal bounds for numerical approximations of infinite horizon problems based on dynamic programming approach

Arxiv

1+阅读 · 2022年4月19日

Expert-Calibrated Learning for Online Optimization with Switching Costs

Arxiv

0+阅读 · 2022年4月18日

Configuration-Aware Safe Control for Mobile Robotic Arm with Control Barrier Functions

Configuration-Aware Safe Control for Mobile Robotic Arm with Control Barrier Functions

Arxiv

1+阅读 · 2022年4月18日

Methodical Advice Collection and Reuse in Deep Reinforcement Learning

Arxiv

1+阅读 · 2022年4月14日

Testing distributional assumptions of learning algorithms

Arxiv

0+阅读 · 2022年4月14日

Adaptive Synthetic Characters for Military Training

Adaptive Synthetic Characters for Military Training

Arxiv

49+阅读 · 2021年1月6日

The Causal Learning of Retail Delinquency

Arxiv

14+阅读 · 2020年12月17日

相关基金

复杂市场环境下多阶段不等面积设施动态布局优化研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于时空域模型分解策略的流程企业级协同优化方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于视觉注意机制的SAR图像小目标检测方法研究

国家自然科学基金

4+阅读 · 2013年12月31日

电子商务环境下基于智能优化算法的订单调度问题的研究

国家自然科学基金

0+阅读 · 2013年12月31日

非精确点集的计算几何优化算法研究

国家自然科学基金

0+阅读 · 2012年12月31日

β-Sarcoglycan在mSOD1介导ALS骨骼肌病变中的机制研究

国家自然科学基金

1+阅读 · 2012年12月31日

约束Markov过程的大偏差与拟遍历性及相关问题

国家自然科学基金

0+阅读 · 2012年12月31日

针对环境变量不确定性的进化鲁棒优化算法

国家自然科学基金

0+阅读 · 2011年12月31日

基于神经-体液调控机制的有机制造系统自适应控制研究

国家自然科学基金

0+阅读 · 2011年12月31日

癌症相关受体EGFR、Fas、ER和AR与钙调素相互作用的晶体结构研究

国家自然科学基金

1+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员