有限地平线Q学习:稳定、趋同、模拟和智能网格应用程序 (Finite Horizon Q-learning: Stability, Convergence, Simulations and an application on Smart Grids) - 专知论文

会员服务 ·

0

Markov · Analysis · 情景 · Performer · Learning ·

2022 年 8 月 6 日

Finite Horizon Q-learning: Stability, Convergence, Simulations and an application on Smart Grids

翻译：有限地平线Q学习:稳定、趋同、模拟和智能网格应用程序

Vivek VP,Dr. Shalabh Bhatnagar

Q-learning is a popular reinforcement learning algorithm. This algorithm has however been studied and analysed mainly in the infinite horizon setting. There are several important applications which can be modeled in the framework of finite horizon Markov decision processes. We develop a version of Q-learning algorithm for finite horizon Markov decision processes (MDP) and provide a full proof of its stability and convergence. Our analysis of stability and convergence of finite horizon Q-learning is based entirely on the ordinary differential equations (O.D.E) method. We also demonstrate the performance of our algorithm on a setting of random MDP as well as on an application on smart grids.

翻译：Q-学习是一种受欢迎的强化学习算法,但这一算法主要是在无限的地平线环境中研究和分析的,在有限的地平线马尔科夫决定程序的框架内可以建模若干重要的应用程序。我们为有限的地平线马尔科夫决定程序开发了一套Q-学习算法(MDP),并充分证明了它的稳定性和趋同性。我们对有限地平线Q-学习的稳定性和趋同性的分析完全基于普通的差别方程(O.D.E)方法。我们还展示了我们算法在随机的MDP设置和智能电网应用方面的性能。

0

相关内容

Markov

【教程】深度学习Keras与TensorFlow教程，Deep Learning with Keras and Tensorflow in R

【教程】深度学习Keras与TensorFlow教程，Deep Learning with Keras and Tensorflow in R

专知会员服务

32+阅读 · 2022年3月9日

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

小G蛋白泛素化分子机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

牛磺酸抑制AS肉鸡右心肥大过程中calpains介导细胞凋亡作用的研究

国家自然科学基金

0+阅读 · 2015年12月31日

MicroRNA调控BACE1在AD发病中的作用与机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

领域驱动空间co-location模式挖掘技术研究

国家自然科学基金

0+阅读 · 2014年12月31日

新泛素化修饰因子对Hedgehog信号通路调控机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

长链非编码RNA CAR intergenic 10在细胞衰老中的作用和机制

国家自然科学基金

1+阅读 · 2013年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

miR-663在肝癌细胞增殖中的作用及分子机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

大麻素WIN靶向PPARγ22522;因抗肝细胞癌增殖及其信号转导通路研究

国家自然科学基金

0+阅读 · 2011年12月31日

癌基因Pim-1对细胞衰老的调节作用及其分子机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

Non-Convergence and Limit Cycles in the Adam optimizer

Arxiv

0+阅读 · 2022年10月5日

Analysis of the rate of convergence of an over-parametrized deep neural network estimate learned by gradient descent

Arxiv

0+阅读 · 2022年10月4日

Linear Convergence of Natural Policy Gradient Methods with Log-Linear Policies

Arxiv

0+阅读 · 2022年10月4日

An Adaptive sampling and domain learning strategy for multivariate function approximation on unknown domains

Arxiv

0+阅读 · 2022年10月4日

Faster Last-iterate Convergence of Policy Optimization in Zero-Sum Markov Games

Arxiv

0+阅读 · 2022年10月4日

Policy Gradient for Reinforcement Learning with General Utilities

Arxiv

0+阅读 · 2022年10月3日

Offline Reinforcement Learning with Differentiable Function Approximation is Provably Efficient

Arxiv

0+阅读 · 2022年10月3日

An adaptive superconvergent finite element method based on local residual minimization

Arxiv

0+阅读 · 2022年10月1日

A General Framework for Sample-Efficient Function Approximation in Reinforcement Learning

A General Framework for Sample-Efficient Function Approximation in Reinforcement Learning

Arxiv

1+阅读 · 2022年9月30日

Linear Convergence for Natural Policy Gradient with Log-linear Policy Parametrization

Arxiv

0+阅读 · 2022年9月30日

VIP会员

文章信息

相关主题

相关VIP内容

【教程】深度学习Keras与TensorFlow教程，Deep Learning with Keras and Tensorflow in R

【教程】深度学习Keras与TensorFlow教程，Deep Learning with Keras and Tensorflow in R

专知会员服务

32+阅读 · 2022年3月9日

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【NeurIPS 2025】视觉指令瓶颈微调

什么是模块化开放系统方法（MOSA）？从美陆军新型倾转旋翼机视角解读

【牛津博士论文】面向视觉、物理与语言应用的可信机器学习模型

医学领域大型语言模型的新进展

相关资讯

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

相关论文

Non-Convergence and Limit Cycles in the Adam optimizer

Arxiv

0+阅读 · 2022年10月5日

Analysis of the rate of convergence of an over-parametrized deep neural network estimate learned by gradient descent

Arxiv

0+阅读 · 2022年10月4日

Linear Convergence of Natural Policy Gradient Methods with Log-Linear Policies

Arxiv

0+阅读 · 2022年10月4日

An Adaptive sampling and domain learning strategy for multivariate function approximation on unknown domains

Arxiv

0+阅读 · 2022年10月4日

Faster Last-iterate Convergence of Policy Optimization in Zero-Sum Markov Games

Arxiv

0+阅读 · 2022年10月4日

Policy Gradient for Reinforcement Learning with General Utilities

Arxiv

0+阅读 · 2022年10月3日

Offline Reinforcement Learning with Differentiable Function Approximation is Provably Efficient

Arxiv

0+阅读 · 2022年10月3日

An adaptive superconvergent finite element method based on local residual minimization

Arxiv

0+阅读 · 2022年10月1日

A General Framework for Sample-Efficient Function Approximation in Reinforcement Learning

A General Framework for Sample-Efficient Function Approximation in Reinforcement Learning

Arxiv

1+阅读 · 2022年9月30日

Linear Convergence for Natural Policy Gradient with Log-linear Policy Parametrization

Arxiv

0+阅读 · 2022年9月30日

相关基金

小G蛋白泛素化分子机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

牛磺酸抑制AS肉鸡右心肥大过程中calpains介导细胞凋亡作用的研究

国家自然科学基金

0+阅读 · 2015年12月31日

MicroRNA调控BACE1在AD发病中的作用与机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

领域驱动空间co-location模式挖掘技术研究

国家自然科学基金

0+阅读 · 2014年12月31日

新泛素化修饰因子对Hedgehog信号通路调控机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

长链非编码RNA CAR intergenic 10在细胞衰老中的作用和机制

国家自然科学基金

1+阅读 · 2013年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

miR-663在肝癌细胞增殖中的作用及分子机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

大麻素WIN靶向PPARγ22522;因抗肝细胞癌增殖及其信号转导通路研究

国家自然科学基金

0+阅读 · 2011年12月31日

癌基因Pim-1对细胞衰老的调节作用及其分子机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员