Markov 决策过程中普通二级顺序值迭代 (Generalized Second Order Value Iteration in Markov Decision Processes) - 专知论文

会员服务 ·

0

值迭代 · 优化器 · 价值函数 · Processing（编程语言） · 不动点方程 ·

2021 年 9 月 18 日

Generalized Second Order Value Iteration in Markov Decision Processes

翻译：Markov 决策过程中普通二级顺序值迭代

Chandramouli Kamanchi,Raghuram Bharadwaj Diddigi,Shalabh Bhatnagar

from arxiv, Accepted for publication at IEEE Transactions on Automatic Control

Value iteration is a fixed point iteration technique utilized to obtain the optimal value function and policy in a discounted reward Markov Decision Process (MDP). Here, a contraction operator is constructed and applied repeatedly to arrive at the optimal solution. Value iteration is a first order method and therefore it may take a large number of iterations to converge to the optimal solution. Successive relaxation is a popular technique that can be applied to solve a fixed point equation. It has been shown in the literature that, under a special structure of the MDP, successive over-relaxation technique computes the optimal value function faster than standard value iteration. In this work, we propose a second order value iteration procedure that is obtained by applying the Newton-Raphson method to the successive relaxation value iteration scheme. We prove the global convergence of our algorithm to the optimal solution asymptotically and show the second order convergence. Through experiments, we demonstrate the effectiveness of our proposed approach.

翻译：值迭代是一种固定点迭代技术,用于在贴现的奖励Markov决定程序(MDP)中获取最佳值函数和政策。在这里,一个收缩操作员是反复构建和运用以达成最佳解决办法。值迭代是一种第一顺序方法,因此可能需要大量迭代才能达到最佳解决办法。连续放松是一种流行技术,可用于解决固定点方程式。文献显示,在磁盘驱动的特殊结构下,连续的过度放松技术比标准值迭代更快地计算最佳值函数。在这项工作中,我们建议采用牛顿-拉夫森方法对连续的放松值迭代方案适用第二顺序迭代程序。我们证明,我们算法与最佳解决办法的全球趋同是随机并显示第二个顺序趋同。我们通过实验,展示了我们拟议办法的有效性。

0

相关内容

值迭代

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

111+阅读 · 2020年5月15日

【剑桥大学】统计因果关系的决策理论基础，Decision-theoretic foundations for statistical causality

【剑桥大学】统计因果关系的决策理论基础，Decision-theoretic foundations for statistical causality

专知会员服务

48+阅读 · 2020年5月5日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【推荐】用Tensorflow理解LSTM

【推荐】用Tensorflow理解LSTM

机器学习研究会

36+阅读 · 2017年9月11日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

【强化学习】NIPS的最佳论文强化学习Value iteration Network 及代码；目前深度学习和增强学习交叉应用最火

【强化学习】NIPS的最佳论文强化学习Value iteration Network 及代码；目前深度学习和增强学习交叉应用最火

产业智能官

6+阅读 · 2017年9月1日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

MetaFEM: A Generic FEM Solver By Meta-expressions

Arxiv

0+阅读 · 2021年11月11日

Arxiv

0+阅读 · 2021年11月11日

Robust Batch Policy Learning in Markov Decision Processes

Arxiv

0+阅读 · 2021年11月10日

Approximate solution of the Cauchy problem for a first-order integrodifferential equation with solution derivative memory

Arxiv

0+阅读 · 2021年11月9日

Differentiable Artificial Reverberation

Arxiv

0+阅读 · 2021年11月9日

Batch Policy Learning in Average Reward Markov Decision Processes

Arxiv

0+阅读 · 2021年11月9日

Density Constrained Reinforcement Learning

Arxiv

6+阅读 · 2021年6月24日

Pipeline PSRO: A Scalable Approach for Finding Approximate Nash Equilibria in Large Games

Arxiv

3+阅读 · 2020年6月15日

Reinforcement Learning with Perturbed Rewards

Arxiv

4+阅读 · 2018年10月5日

Large-Scale Stochastic Sampling from the Probability Simplex

Arxiv

3+阅读 · 2018年6月19日

VIP会员

文章信息

相关主题

Processing（编程语言）

不动点方程

相关VIP内容

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

111+阅读 · 2020年5月15日

【剑桥大学】统计因果关系的决策理论基础，Decision-theoretic foundations for statistical causality

【剑桥大学】统计因果关系的决策理论基础，Decision-theoretic foundations for statistical causality

专知会员服务

48+阅读 · 2020年5月5日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

新书册《几何深度学习的数学基础》

中程单向攻击无人机的战略意义：俄乌战争启示

在无标注条件下适配视觉—语言模型：全面综述

面向视觉语言模型的持续学习：遗忘之外的综述与分类体系

相关资讯

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【推荐】用Tensorflow理解LSTM

【推荐】用Tensorflow理解LSTM

机器学习研究会

36+阅读 · 2017年9月11日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

【强化学习】NIPS的最佳论文强化学习Value iteration Network 及代码；目前深度学习和增强学习交叉应用最火

【强化学习】NIPS的最佳论文强化学习Value iteration Network 及代码；目前深度学习和增强学习交叉应用最火

产业智能官

6+阅读 · 2017年9月1日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

MetaFEM: A Generic FEM Solver By Meta-expressions

Arxiv

0+阅读 · 2021年11月11日

Arxiv

0+阅读 · 2021年11月11日

Robust Batch Policy Learning in Markov Decision Processes

Arxiv

0+阅读 · 2021年11月10日

Approximate solution of the Cauchy problem for a first-order integrodifferential equation with solution derivative memory

Arxiv

0+阅读 · 2021年11月9日

Differentiable Artificial Reverberation

Arxiv

0+阅读 · 2021年11月9日

Batch Policy Learning in Average Reward Markov Decision Processes

Arxiv

0+阅读 · 2021年11月9日

Density Constrained Reinforcement Learning

Arxiv

6+阅读 · 2021年6月24日

Pipeline PSRO: A Scalable Approach for Finding Approximate Nash Equilibria in Large Games

Arxiv

3+阅读 · 2020年6月15日

Reinforcement Learning with Perturbed Rewards

Arxiv

4+阅读 · 2018年10月5日

Large-Scale Stochastic Sampling from the Probability Simplex

Arxiv

3+阅读 · 2018年6月19日

微信扫码咨询专知VIP会员