一份基本证明 Q- 学习聚合几乎可以肯定 (An Elementary Proof that Q-learning Converges Almost Surely) - 专知论文

会员服务 ·

0

几乎必然 · 深度Q网络 · Q网络` · 估计/估计量 · state-of-the-art ·

2021 年 8 月 5 日

An Elementary Proof that Q-learning Converges Almost Surely

翻译：一份基本证明 Q- 学习聚合几乎可以肯定

Matthew T. Regehr,Alex Ayoub

Watkins' and Dayan's Q-learning is a model-free reinforcement learning algorithm that iteratively refines an estimate for the optimal action-value function of an MDP by stochastically "visiting" many state-ation pairs [Watkins and Dayan, 1992]. Variants of the algorithm lie at the heart of numerous recent state-of-the-art achievements in reinforcement learning, including the superhuman Atari-playing deep Q-network [Mnih et al., 2015]. The goal of this paper is to reproduce a precise and (nearly) self-contained proof that Q-learning converges. Much of the available literature leverages powerful theory to obtain highly generalizable results in this vein. However, this approach requires the reader to be familiar with and make many deep connections to different research areas. A student seeking to deepen their understand of Q-learning risks becoming caught in a vicious cycle of "RL-learning Hell". For this reason, we give a complete proof from start to finish using only one external result from the field of stochastic approximation, despite the fact that this minimal dependence on other results comes at the expense of some "shininess".

翻译：Watkins'和Dayaan的Q-学习是一种没有模型的强化学习算法,它通过“访问”许多州配对[Watkins和Dayan,1992年],迭接地完善了对MDP最佳行动价值功能的估计。算法的变式是最近在加强学习方面取得的许多最新最先进成就的核心,其中包括超人阿塔里人深层次的Q-网络[Mnih等人,2015年]。本文的目的是通过“访问”许多州配对[Watkins和Dayan,1992年],对MDP的最佳行动价值功能作出精确和(近距离的)自成一体的证据。许多现有文献都利用强有力的理论来获得这一类中非常普遍的结果。然而,这种方法要求读者熟悉和不同研究领域建立许多深层的联系。一个学生试图加深对Q-学习风险的理解,从而陷入“学习地狱”的恶性循环。为此,我们从一开始就完全地证明,只有利用某种外部结果,即精准的近似的近似,尽管这种最低程度对其他结果的依赖性代价是多少。

0

相关内容

几乎必然

【AAAI2021】自校正Q学习，Self-correcting Q-Learning

专知会员服务

17+阅读 · 2020年12月4日

【DeepMind】强化学习教程，83页ppt

【DeepMind】强化学习教程，83页ppt

专知会员服务

158+阅读 · 2020年8月7日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

TensorFlow深度学习，从线性回归到强化学习的深度学习（TensorFlow for Deep Learning From Linear Regression to Reinforcement Learning），附页256页pdf

TensorFlow深度学习，从线性回归到强化学习的深度学习（TensorFlow for Deep Learning From Linear Regression to Reinforcement Learning），附页256页pdf

专知会员服务

46+阅读 · 2020年1月1日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

强化学习扫盲贴：从Q-learning到DQN

强化学习扫盲贴：从Q-learning到DQN

夕小瑶的卖萌屋

52+阅读 · 2019年10月13日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

动物脑的好奇心和强化学习的好奇心

动物脑的好奇心和强化学习的好奇心

CreateAMind

10+阅读 · 2019年1月26日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Reinforcement Learning: An Introduction 2018第二版 500页

Reinforcement Learning: An Introduction 2018第二版 500页

CreateAMind

14+阅读 · 2018年4月27日

已删除

将门创投

5+阅读 · 2018年2月28日

【推荐】GAN架构入门综述(资源汇总)

【推荐】GAN架构入门综述(资源汇总)

机器学习研究会

10+阅读 · 2017年9月3日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

Analysis and Optimisation of Bellman Residual Errors with Neural Function Approximation

Arxiv

0+阅读 · 2021年10月6日

Optimal rates of convergence and error localization of Gegenbauer projections

Arxiv

0+阅读 · 2021年10月6日

An Improved Approximation for Maximum $k$-Dependent Set on Bipartite Graphs

Arxiv

0+阅读 · 2021年10月6日

Uniform Bounds for Scheduling with Job Size Estimates

Arxiv

0+阅读 · 2021年10月1日

Mean-Field Controls with Q-learning for Cooperative MARL: Convergence and Complexity Analysis

Mean-Field Controls with Q-learning for Cooperative MARL: Convergence and Complexity Analysis

Arxiv

0+阅读 · 2021年10月1日

Learning to Elect

Arxiv

0+阅读 · 2021年10月1日

Self-correcting Q-Learning

Arxiv

11+阅读 · 2020年12月2日

An Introduction to Deep Reinforcement Learning

Arxiv

4+阅读 · 2018年12月3日

Variance Reduction Methods for Sublinear Reinforcement Learning

Arxiv

4+阅读 · 2018年4月25日

(FPT-)Approximation Algorithms for the Virtual Network Embedding Problem

Arxiv

4+阅读 · 2018年3月12日

VIP会员

文章信息

相关主题

估计/估计量

state-of-the-art

相关VIP内容

【AAAI2021】自校正Q学习，Self-correcting Q-Learning

专知会员服务

17+阅读 · 2020年12月4日

【DeepMind】强化学习教程，83页ppt

【DeepMind】强化学习教程，83页ppt

专知会员服务

158+阅读 · 2020年8月7日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

TensorFlow深度学习，从线性回归到强化学习的深度学习（TensorFlow for Deep Learning From Linear Regression to Reinforcement Learning），附页256页pdf

TensorFlow深度学习，从线性回归到强化学习的深度学习（TensorFlow for Deep Learning From Linear Regression to Reinforcement Learning），附页256页pdf

专知会员服务

46+阅读 · 2020年1月1日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【伯克利博士论文】通过真实世界实践赋能机器人自主性

军用无人机集群技术尚未成熟——但潜力可期

人工智能安全治理白皮书（2025）

AgentOps综述：分类、挑战与未来方向

相关资讯

强化学习扫盲贴：从Q-learning到DQN

强化学习扫盲贴：从Q-learning到DQN

夕小瑶的卖萌屋

52+阅读 · 2019年10月13日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

动物脑的好奇心和强化学习的好奇心

动物脑的好奇心和强化学习的好奇心

CreateAMind

10+阅读 · 2019年1月26日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Reinforcement Learning: An Introduction 2018第二版 500页

Reinforcement Learning: An Introduction 2018第二版 500页

CreateAMind

14+阅读 · 2018年4月27日

已删除

将门创投

5+阅读 · 2018年2月28日

【推荐】GAN架构入门综述(资源汇总)

【推荐】GAN架构入门综述(资源汇总)

机器学习研究会

10+阅读 · 2017年9月3日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

Analysis and Optimisation of Bellman Residual Errors with Neural Function Approximation

Arxiv

0+阅读 · 2021年10月6日

Optimal rates of convergence and error localization of Gegenbauer projections

Arxiv

0+阅读 · 2021年10月6日

An Improved Approximation for Maximum $k$-Dependent Set on Bipartite Graphs

Arxiv

0+阅读 · 2021年10月6日

Uniform Bounds for Scheduling with Job Size Estimates

Arxiv

0+阅读 · 2021年10月1日

Mean-Field Controls with Q-learning for Cooperative MARL: Convergence and Complexity Analysis

Mean-Field Controls with Q-learning for Cooperative MARL: Convergence and Complexity Analysis

Arxiv

0+阅读 · 2021年10月1日

Learning to Elect

Arxiv

0+阅读 · 2021年10月1日

Self-correcting Q-Learning

Arxiv

11+阅读 · 2020年12月2日

An Introduction to Deep Reinforcement Learning

Arxiv

4+阅读 · 2018年12月3日

Variance Reduction Methods for Sublinear Reinforcement Learning

Arxiv

4+阅读 · 2018年4月25日

(FPT-)Approximation Algorithms for the Virtual Network Embedding Problem

Arxiv

4+阅读 · 2018年3月12日

微信扫码咨询专知VIP会员