MDPs 平行蒸汽镜底 (Parallel Stochastic Mirror Descent for MDPs) - 专知论文

会员服务 ·

0

估计/估计量 · CASE · 生成模型 · 优化器 · Processing（编程语言） ·

2021 年 4 月 13 日

Parallel Stochastic Mirror Descent for MDPs

翻译：MDPs 平行蒸汽镜底

Daniil Tiapkin,Fedor Stonyakin,Alexander Gasnikov

We consider the problem of learning the optimal policy for infinite-horizon Markov decision processes (MDPs). For this purpose, some variant of Stochastic Mirror Descent is proposed for convex programming problems with Lipschitz-continuous functionals. An important detail is the ability to use inexact values of functional constraints. We analyze this algorithm in a general case and obtain an estimate of the convergence rate that does not accumulate errors during the operation of the method. Using this algorithm, we get the first parallel algorithm for average-reward MDPs with a generative model. One of the main features of the presented method is low communication costs in a distributed centralized setting.

翻译：我们考虑了学习无限象子Markov(MDPs)决策程序的最佳政策的问题。为此,针对利普施奇茨连续功能的细微编程问题,我们建议了Stochastic Mirror Spores的某种变种。一个重要的细节是使用功能限制的不精确值的能力。我们分析了一般情况下的这种算法,并获得了在方法运行期间没有累积错误的趋同率的估计值。我们使用这种算法,我们获得了具有基因模型的平均奖励 MDP的首种平行算法。所介绍的方法的主要特征之一是分布式集中环境中的低通信成本。

0

相关内容

估计/估计量

估计/估计量

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

专知会员服务

54+阅读 · 2021年1月20日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【经典书】应用随机微分方程，324页pdf，Applied Stochastic Differential Equations

【经典书】应用随机微分方程，324页pdf，Applied Stochastic Differential Equations

专知会员服务

60+阅读 · 2020年11月21日

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

专知会员服务

19+阅读 · 2020年6月29日

【陈天奇】TVM：端到端自动深度学习编译器，244页ppt

【陈天奇】TVM：端到端自动深度学习编译器，244页ppt

专知会员服务

87+阅读 · 2020年5月11日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

已删除

将门创投

4+阅读 · 2020年6月12日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

Asynchronous Distributed Optimization with Redundancy in Cost Functions

Arxiv

0+阅读 · 2021年6月7日

Learning Stochastic Optimal Policies via Gradient Descent

Arxiv

0+阅读 · 2021年6月7日

Neograd: Near-Ideal Gradient Descent

Neograd: Near-Ideal Gradient Descent

Arxiv

0+阅读 · 2021年6月7日

Dynamics of Stochastic Momentum Methods on Large-scale, Quadratic Models

Arxiv

0+阅读 · 2021年6月7日

Mirror Descent Policy Optimization

Arxiv

0+阅读 · 2021年6月7日

Fast and Robust Online Inference with Stochastic Gradient Descent via Random Scaling

Arxiv

0+阅读 · 2021年6月6日

Federated Accelerated Stochastic Gradient Descent

Arxiv

0+阅读 · 2021年6月5日

Trajectory Optimization of Chance-Constrained Nonlinear Stochastic Systems for Motion Planning and Control

Arxiv

0+阅读 · 2021年6月5日

Multitask Online Mirror Descent

Arxiv

0+阅读 · 2021年6月4日

Accelerated Randomized Coordinate Descent Algorithms for Stochastic Optimization and Online Learning

Arxiv

9+阅读 · 2018年7月16日

VIP会员

文章信息

相关主题

估计/估计量

Processing（编程语言）

相关VIP内容

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

专知会员服务

54+阅读 · 2021年1月20日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【经典书】应用随机微分方程，324页pdf，Applied Stochastic Differential Equations

【经典书】应用随机微分方程，324页pdf，Applied Stochastic Differential Equations

专知会员服务

60+阅读 · 2020年11月21日

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

专知会员服务

19+阅读 · 2020年6月29日

【陈天奇】TVM：端到端自动深度学习编译器，244页ppt

【陈天奇】TVM：端到端自动深度学习编译器，244页ppt

专知会员服务

87+阅读 · 2020年5月11日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

《在单一作战合成环境（SSE）中运用人工智能与大型语言模型以提供灵活人文地形及可信角色组》报告

《俄罗斯的未来战争方式第二部分：核威慑》报告

《提示战争：大语言模型如何决定军事干预》报告

《俄罗斯的未来战争方式第三部分：军事改革》报告

相关资讯

已删除

将门创投

4+阅读 · 2020年6月12日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

Asynchronous Distributed Optimization with Redundancy in Cost Functions

Arxiv

0+阅读 · 2021年6月7日

Learning Stochastic Optimal Policies via Gradient Descent

Arxiv

0+阅读 · 2021年6月7日

Neograd: Near-Ideal Gradient Descent

Neograd: Near-Ideal Gradient Descent

Arxiv

0+阅读 · 2021年6月7日

Dynamics of Stochastic Momentum Methods on Large-scale, Quadratic Models

Arxiv

0+阅读 · 2021年6月7日

Mirror Descent Policy Optimization

Arxiv

0+阅读 · 2021年6月7日

Fast and Robust Online Inference with Stochastic Gradient Descent via Random Scaling

Arxiv

0+阅读 · 2021年6月6日

Federated Accelerated Stochastic Gradient Descent

Arxiv

0+阅读 · 2021年6月5日

Trajectory Optimization of Chance-Constrained Nonlinear Stochastic Systems for Motion Planning and Control

Arxiv

0+阅读 · 2021年6月5日

Multitask Online Mirror Descent

Arxiv

0+阅读 · 2021年6月4日

Accelerated Randomized Coordinate Descent Algorithms for Stochastic Optimization and Online Learning

Arxiv

9+阅读 · 2018年7月16日

微信扫码咨询专知VIP会员