Q-学习分立时间切换系统分析 (A Discrete-Time Switching System Analysis of Q-learning) - 专知论文

会员服务 ·

0

欠估计 · 过估计 · 估计误差 · 估计/估计量 · 人工智能 ·

2021 年 5 月 26 日

A Discrete-Time Switching System Analysis of Q-learning

翻译：Q-学习分立时间切换系统分析

Donghwan Lee,Jianghai Hu,Niao He

This paper develops a novel control-theoretic framework to analyze the non-asymptotic convergence of Q-learning. We show that the dynamics of asynchronous Q-learning with a constant step-size can be naturally formulated as a discrete-time stochastic affine switching system. Moreover, the evolution of the Q-learning estimation error is over- and underestimated by trajectories of two simpler dynamical systems. Based on these two systems, we derive a new finite-time error bound of asynchronous Q-learning when a constant stepsize is used. Our analysis also sheds light on the overestimation phenomenon of Q-learning. We further illustrate and validate the analysis through numerical simulations.

翻译：本文开发了一个新的控制理论框架, 用于分析Q- 学习的非同步融合。我们显示, 具有恒定步尺寸的无同步Q学习的动态可以自然地被设计成一个离散时间切开的松动切片切换系统。此外, Q- 学习估计错误的演变被两个更简单的动态系统的轨迹所过度和低估。在这两个系统的基础上, 我们得出了一个新的有限时间错误, 在使用恒定步态时, 将不同步Q- 学习捆绑在一起。我们的分析还揭示了Q- 学习的过高估计现象。我们通过数字模拟进一步说明和验证分析。

0

相关内容

欠估计

【ICML2020-上海交大】多智能体确定性Q-Learning， Multi-Agent Determinantal Q-Learning

【ICML2020-上海交大】多智能体确定性Q-Learning， Multi-Agent Determinantal Q-Learning

专知会员服务

38+阅读 · 2020年6月3日

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

专知会员服务

131+阅读 · 2020年4月19日

【开放书】部分观测动态系统的贝叶斯学习，119页pdf，Bayesian Learning for partially observed dynamical systems

【开放书】部分观测动态系统的贝叶斯学习，119页pdf，Bayesian Learning for partially observed dynamical systems

专知会员服务

41+阅读 · 2019年12月27日

【金融机器学习课程资料】Financial Machine Learning

专知会员服务

119+阅读 · 2019年12月24日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

163+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

强化学习扫盲贴：从Q-learning到DQN

强化学习扫盲贴：从Q-learning到DQN

夕小瑶的卖萌屋

52+阅读 · 2019年10月13日

Github项目推荐 | 最优控制、强化学习和运动规划等主题参考文献集锦

Github项目推荐 | 最优控制、强化学习和运动规划等主题参考文献集锦

AI研习社

3+阅读 · 2019年4月21日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

随波逐流：Similarity-Adaptive and Discrete Optimization

随波逐流：Similarity-Adaptive and Discrete Optimization

我爱读PAMI

5+阅读 · 2018年2月6日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

Partially Observable Markov Decision Processes (POMDPs) and Robotics

Arxiv

0+阅读 · 2021年7月15日

Towards a Dimension-Free Understanding of Adaptive Linear Control

Arxiv

0+阅读 · 2021年7月15日

Performance Analysis on Machine Learning-Based Channel Estimation

Arxiv

0+阅读 · 2021年7月14日

A Framework for Machine Learning of Model Error in Dynamical Systems

A Framework for Machine Learning of Model Error in Dynamical Systems

Arxiv

0+阅读 · 2021年7月14日

Fast Parallel-in-Time Quasi-Boundary Value Methods for Backward Heat Conduction Problems

Arxiv

0+阅读 · 2021年7月13日

Scalable Biophysical Simulations of the Neuromuscular System

Arxiv

0+阅读 · 2021年7月13日

Self-correcting Q-Learning

Arxiv

11+阅读 · 2020年12月2日

Lipschitz Lifelong Reinforcement Learning

Arxiv

4+阅读 · 2020年1月17日

Variational Bayesian Reinforcement Learning with Regret Bounds

Arxiv

3+阅读 · 2018年7月25日

Accelerated Reinforcement Learning

Arxiv

6+阅读 · 2018年4月24日

VIP会员

文章信息

相关主题

估计/估计量

相关VIP内容

【ICML2020-上海交大】多智能体确定性Q-Learning， Multi-Agent Determinantal Q-Learning

【ICML2020-上海交大】多智能体确定性Q-Learning， Multi-Agent Determinantal Q-Learning

专知会员服务

38+阅读 · 2020年6月3日

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

专知会员服务

131+阅读 · 2020年4月19日

【开放书】部分观测动态系统的贝叶斯学习，119页pdf，Bayesian Learning for partially observed dynamical systems

【开放书】部分观测动态系统的贝叶斯学习，119页pdf，Bayesian Learning for partially observed dynamical systems

专知会员服务

41+阅读 · 2019年12月27日

【金融机器学习课程资料】Financial Machine Learning

专知会员服务

119+阅读 · 2019年12月24日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

163+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

Deep Research（深度研究）：系统性综述

《革新战术战场空间能力：反无人机系统》报告

【普林斯顿博士论文】用于语音的生成式通用模型

螺旋式开发作为战略资产：美军启示

相关资讯

强化学习扫盲贴：从Q-learning到DQN

强化学习扫盲贴：从Q-learning到DQN

夕小瑶的卖萌屋

52+阅读 · 2019年10月13日

Github项目推荐 | 最优控制、强化学习和运动规划等主题参考文献集锦

Github项目推荐 | 最优控制、强化学习和运动规划等主题参考文献集锦

AI研习社

3+阅读 · 2019年4月21日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

随波逐流：Similarity-Adaptive and Discrete Optimization

随波逐流：Similarity-Adaptive and Discrete Optimization

我爱读PAMI

5+阅读 · 2018年2月6日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

Partially Observable Markov Decision Processes (POMDPs) and Robotics

Arxiv

0+阅读 · 2021年7月15日

Towards a Dimension-Free Understanding of Adaptive Linear Control

Arxiv

0+阅读 · 2021年7月15日

Performance Analysis on Machine Learning-Based Channel Estimation

Arxiv

0+阅读 · 2021年7月14日

A Framework for Machine Learning of Model Error in Dynamical Systems

A Framework for Machine Learning of Model Error in Dynamical Systems

Arxiv

0+阅读 · 2021年7月14日

Fast Parallel-in-Time Quasi-Boundary Value Methods for Backward Heat Conduction Problems

Arxiv

0+阅读 · 2021年7月13日

Scalable Biophysical Simulations of the Neuromuscular System

Arxiv

0+阅读 · 2021年7月13日

Self-correcting Q-Learning

Arxiv

11+阅读 · 2020年12月2日

Lipschitz Lifelong Reinforcement Learning

Arxiv

4+阅读 · 2020年1月17日

Variational Bayesian Reinforcement Learning with Regret Bounds

Arxiv

3+阅读 · 2018年7月25日

Accelerated Reinforcement Learning

Arxiv

6+阅读 · 2018年4月24日

微信扫码咨询专知VIP会员