在 " 过滤稳定 " 下,POMDPs的 " 有限记忆学习 " 与 " 接近最佳 " 的 " 筛选稳定 " 下 " 总结政策 " 的 " 有限记忆 " 学习 (Convergence of Finite Memory Q-Learning for POMDPs and Near Optimality of Learned Policies under Filter Stability) - 专知论文

会员服务 ·

0

优化器 · Performer · 不动点方程 · 近似 · 近似误差 ·

2021 年 10 月 2 日

Convergence of Finite Memory Q-Learning for POMDPs and Near Optimality of Learned Policies under Filter Stability

翻译：在 " 过滤稳定 " 下,POMDPs的 " 有限记忆学习 " 与 " 接近最佳 " 的 " 筛选稳定 " 下 " 总结政策 " 的 " 有限记忆 " 学习

Ali Devran Kara,Serdar Yuksel

from arxiv, 32 pages, 12 figures. arXiv admin note: text overlap with arXiv:2010.07452

In this paper, for POMDPs, we provide the convergence of a Q learning algorithm for control policies using a finite history of past observations and control actions, and, consequentially, we establish near optimality of such limit Q functions under explicit filter stability conditions. We present explicit error bounds relating the approximation error to the length of the finite history window. We establish the convergence of such Q-learning iterations under mild ergodicity assumptions on the state process during the exploration phase. We further show that the limit fixed point equation gives an optimal solution for an approximate belief-MDP. We then provide bounds on the performance of the policy obtained using the limit Q values compared to the performance of the optimal policy for the POMDP, where we also present explicit conditions using recent results on filter stability in controlled POMDPs. While there exist many experimental results, (i) the rigorous asymptotic convergence (to an approximate MDP value function) for such finite-memory Q-learning algorithms, and (ii) the near optimality with an explicit rate of convergence (in the memory size) are results that are new to the literature, to our knowledge.

翻译：在本文中,对于POMDPs,我们利用过去观测和控制行动的有限历史,为控制政策提供了Q学习算法的趋同,因此,我们在明确的过滤稳定条件下建立了这种限制Q功能的近似最佳性能。我们提出了将近似差错与有限历史窗口的长度相联系的明确错误界限。我们根据在勘探阶段国家进程的轻度异端假设建立了这种Q学习迭代的趋同。我们进一步表明,定点方程式为大约的信仰-MDP提供了最佳解决办法。然后,我们提供了使用限制Q值获得的政策绩效与POMDP最佳政策绩效的界限,我们在该范围内还利用受控制的POMDPs中过滤稳定性的最新结果提出了明确条件。虽然有许多实验结果,但(一) 对这种有限的模拟Q-学习算法的严格性归结(大致为MDP值函数),以及(二) 与明确的趋同率(记忆大小)的接近最佳性能是我们文献的新结果。

0

相关内容

优化器

【南京大学】量子计算 (Spring 2021)课程

专知会员服务

59+阅读 · 2021年4月12日

【AAAI2021】自校正Q学习，Self-correcting Q-Learning

专知会员服务

17+阅读 · 2020年12月4日

哥伦比亚大学最新《机器学习》课程，Fall-B 2020 (Machine Learning)

专知会员服务

39+阅读 · 2020年11月3日

【ICML2020-伯克利】稳定非策略强化学习的表示，Representations for Stable Off-Policy Reinforcement Learning

【ICML2020-伯克利】稳定非策略强化学习的表示，Representations for Stable Off-Policy Reinforcement Learning

专知会员服务

17+阅读 · 2020年7月14日

【ICML2020-伯克利-马毅老师组】深度等距学习的视觉识别，Deep Isometric Learning for Visual Recognition

【ICML2020-伯克利-马毅老师组】深度等距学习的视觉识别，Deep Isometric Learning for Visual Recognition

专知会员服务

25+阅读 · 2020年7月1日

【新书】人工智能Python代码，227页pdf，Python code for Artificial Intelligence: Foundations of Computational Agents

【新书】人工智能Python代码，227页pdf，Python code for Artificial Intelligence: Foundations of Computational Agents

专知会员服务

103+阅读 · 2020年6月21日

【论文推荐】Stochastic Graph Neural Networks，随机图神经网络

【论文推荐】Stochastic Graph Neural Networks，随机图神经网络

专知会员服务

69+阅读 · 2020年6月6日

【单样本(One-shot)学习】《One-shot learning》by Pragati Baheti Part 1/2: Definitions and fundamental techniques

【单样本(One-shot)学习】《One-shot learning》by Pragati Baheti Part 1/2: Definitions and fundamental techniques

专知会员服务

30+阅读 · 2020年4月22日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

强化学习扫盲贴：从Q-learning到DQN

强化学习扫盲贴：从Q-learning到DQN

夕小瑶的卖萌屋

52+阅读 · 2019年10月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Reinforcement Learning: An Introduction 2018第二版 500页

Reinforcement Learning: An Introduction 2018第二版 500页

CreateAMind

14+阅读 · 2018年4月27日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

On Linear Convergence of Policy Gradient Methods for Finite MDPs

On Linear Convergence of Policy Gradient Methods for Finite MDPs

Arxiv

0+阅读 · 2021年12月13日

Scheduling Servers with Stochastic Bilinear Rewards

Arxiv

0+阅读 · 2021年12月13日

Convergence and Stability of the Stochastic Proximal Point Algorithm with Momentum

Arxiv

0+阅读 · 2021年12月10日

Model Selection for Generic Reinforcement Learning

Arxiv

0+阅读 · 2021年12月10日

Convergence Results For Q-Learning With Experience Replay

Arxiv

0+阅读 · 2021年12月8日

Interpolating between BSDEs and PINNs -- deep learning for elliptic and parabolic boundary value problems

Arxiv

0+阅读 · 2021年12月7日

Recurrent Neural Network Controllers Synthesis with Stability Guarantees for Partially Observed Systems

Arxiv

0+阅读 · 2021年12月7日

Functional Regularization for Reinforcement Learning via Learned Fourier Features

Arxiv

0+阅读 · 2021年12月6日

Reinforcement Learning for Adaptive Optimal Stationary Control of Linear Stochastic Systems

Arxiv

0+阅读 · 2021年12月5日

Variational Bayesian Reinforcement Learning with Regret Bounds

Arxiv

3+阅读 · 2018年7月25日

VIP会员

文章信息

相关主题

不动点方程

相关VIP内容

【南京大学】量子计算 (Spring 2021)课程

专知会员服务

59+阅读 · 2021年4月12日

【AAAI2021】自校正Q学习，Self-correcting Q-Learning

专知会员服务

17+阅读 · 2020年12月4日

哥伦比亚大学最新《机器学习》课程，Fall-B 2020 (Machine Learning)

专知会员服务

39+阅读 · 2020年11月3日

【ICML2020-伯克利】稳定非策略强化学习的表示，Representations for Stable Off-Policy Reinforcement Learning

【ICML2020-伯克利】稳定非策略强化学习的表示，Representations for Stable Off-Policy Reinforcement Learning

专知会员服务

17+阅读 · 2020年7月14日

【ICML2020-伯克利-马毅老师组】深度等距学习的视觉识别，Deep Isometric Learning for Visual Recognition

【ICML2020-伯克利-马毅老师组】深度等距学习的视觉识别，Deep Isometric Learning for Visual Recognition

专知会员服务

25+阅读 · 2020年7月1日

【新书】人工智能Python代码，227页pdf，Python code for Artificial Intelligence: Foundations of Computational Agents

【新书】人工智能Python代码，227页pdf，Python code for Artificial Intelligence: Foundations of Computational Agents

专知会员服务

103+阅读 · 2020年6月21日

【论文推荐】Stochastic Graph Neural Networks，随机图神经网络

【论文推荐】Stochastic Graph Neural Networks，随机图神经网络

专知会员服务

69+阅读 · 2020年6月6日

【单样本(One-shot)学习】《One-shot learning》by Pragati Baheti Part 1/2: Definitions and fundamental techniques

【单样本(One-shot)学习】《One-shot learning》by Pragati Baheti Part 1/2: Definitions and fundamental techniques

专知会员服务

30+阅读 · 2020年4月22日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

《俄乌战争中的无人系统：新的战争方式与新兴趋势——来自前线的印象》报告

《海上自主水面船舶远程操作中心：安全可持续运行的多维度分析》

多模态大语言模型下游调优中“保持自我”的重要性

隐身自主无人水下航行器技术如何变革水下作战并重塑海军竞争

相关资讯

强化学习扫盲贴：从Q-learning到DQN

强化学习扫盲贴：从Q-learning到DQN

夕小瑶的卖萌屋

52+阅读 · 2019年10月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Reinforcement Learning: An Introduction 2018第二版 500页

Reinforcement Learning: An Introduction 2018第二版 500页

CreateAMind

14+阅读 · 2018年4月27日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

On Linear Convergence of Policy Gradient Methods for Finite MDPs

On Linear Convergence of Policy Gradient Methods for Finite MDPs

Arxiv

0+阅读 · 2021年12月13日

Scheduling Servers with Stochastic Bilinear Rewards

Arxiv

0+阅读 · 2021年12月13日

Convergence and Stability of the Stochastic Proximal Point Algorithm with Momentum

Arxiv

0+阅读 · 2021年12月10日

Model Selection for Generic Reinforcement Learning

Arxiv

0+阅读 · 2021年12月10日

Convergence Results For Q-Learning With Experience Replay

Arxiv

0+阅读 · 2021年12月8日

Interpolating between BSDEs and PINNs -- deep learning for elliptic and parabolic boundary value problems

Arxiv

0+阅读 · 2021年12月7日

Recurrent Neural Network Controllers Synthesis with Stability Guarantees for Partially Observed Systems

Arxiv

0+阅读 · 2021年12月7日

Functional Regularization for Reinforcement Learning via Learned Fourier Features

Arxiv

0+阅读 · 2021年12月6日

Reinforcement Learning for Adaptive Optimal Stationary Control of Linear Stochastic Systems

Arxiv

0+阅读 · 2021年12月5日

Variational Bayesian Reinforcement Learning with Regret Bounds

Arxiv

3+阅读 · 2018年7月25日

微信扫码咨询专知VIP会员