MDPs 平行蒸汽镜底 (Parallel Stochastic Mirror Descent for MDPs) - 专知论文

会员服务 ·

0

估计/估计量 · CASE · 生成模型 · 优化器 · Processing（编程语言） ·

2021 年 7 月 10 日

Parallel Stochastic Mirror Descent for MDPs

翻译：MDPs 平行蒸汽镜底

Daniil Tiapkin,Alexander Gasnikov

We consider the problem of learning the optimal policy for infinite-horizon Markov decision processes (MDPs). For this purpose, some variant of Stochastic Mirror Descent is proposed for convex programming problems with Lipschitz-continuous functionals. An important detail is the ability to use inexact values of functional constraints. We analyze this algorithm in a general case and obtain an estimate of the convergence rate that does not accumulate errors during the operation of the method. Using this algorithm, we get the first parallel algorithm for average-reward MDPs with a generative model. One of the main features of the presented method is low communication costs in a distributed centralized setting.

翻译：我们考虑了学习无限象子Markov(MDPs)决策程序的最佳政策的问题。为此,针对利普施奇茨连续功能的细微编程问题,我们建议了Stochastic Mirror Spores的某种变种。一个重要的细节是使用功能限制的不精确值的能力。我们分析了一般情况下的这种算法,并获得了在方法运行期间没有累积错误的趋同率的估计值。我们使用这种算法,我们获得了具有基因模型的平均奖励 MDP的首种平行算法。所介绍的方法的主要特征之一是分布式集中环境中的低通信成本。

0

相关内容

估计/估计量

估计/估计量

【ICML2021】异质风险最小化，Heterogeneous Risk Minimization

专知会员服务

16+阅读 · 2021年5月21日

【TPAMI2021】鲁棒可微SVD，Robust Differentiable SVD

专知会员服务

23+阅读 · 2021年4月10日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

53+阅读 · 2020年9月7日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

专知会员服务

19+阅读 · 2020年6月29日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

动物脑的好奇心和强化学习的好奇心

动物脑的好奇心和强化学习的好奇心

CreateAMind

10+阅读 · 2019年1月26日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

【OpenAI】深度强化学习关键论文列表

【OpenAI】深度强化学习关键论文列表

专知

11+阅读 · 2018年11月10日

条件GAN重大改进！cGANs with Projection Discriminator

条件GAN重大改进！cGANs with Projection Discriminator

CreateAMind

8+阅读 · 2018年2月7日

carla 学习笔记

carla 学习笔记

CreateAMind

9+阅读 · 2018年2月7日

分布式TensorFlow入门指南

分布式TensorFlow入门指南

机器学习研究会

4+阅读 · 2017年11月28日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

【今日新增】IEEE Trans.专刊截稿信息8条

【今日新增】IEEE Trans.专刊截稿信息8条

Call4Papers

7+阅读 · 2017年6月29日

AdaGDA: Faster Adaptive Gradient Descent Ascent Methods for Minimax Optimization

Arxiv

0+阅读 · 2021年9月14日

DSDF: An approach to handle stochastic agents in collaborative multi-agent reinforcement learning

Arxiv

0+阅读 · 2021年9月14日

Error analysis for 2D stochastic Navier--Stokes equations in bounded domains

Arxiv

0+阅读 · 2021年9月14日

Resource Optimization with Interference Coupling in Multi-IRS-assisted Multi-cell Systems

Arxiv

0+阅读 · 2021年9月13日

Runtime Analysis of Single- and Multi-Objective Evolutionary Algorithms for Chance Constrained Optimization Problems with Normally Distributed Random Variables

Arxiv

1+阅读 · 2021年9月13日

Estimates on the generalization error of Physics Informed Neural Networks (PINNs) for approximating PDEs

Arxiv

1+阅读 · 2021年9月10日

Two-derivative deferred correction time discretization for the discontinuous Galerkin method

Arxiv

0+阅读 · 2021年9月10日

A Dynamic Scheduling Policy for a Network with Heterogeneous Time-Sensitive Traffic

Arxiv

0+阅读 · 2021年9月10日

Robust Differentiable SVD

Arxiv

9+阅读 · 2021年4月8日

Bipedal Walking Robot using Deep Deterministic Policy Gradient

Bipedal Walking Robot using Deep Deterministic Policy Gradient

Arxiv

3+阅读 · 2018年7月16日

VIP会员

文章信息

相关主题

估计/估计量

Processing（编程语言）

相关VIP内容

【ICML2021】异质风险最小化，Heterogeneous Risk Minimization

专知会员服务

16+阅读 · 2021年5月21日

【TPAMI2021】鲁棒可微SVD，Robust Differentiable SVD

专知会员服务

23+阅读 · 2021年4月10日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

53+阅读 · 2020年9月7日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

【ICML2020】噪声在随机梯度下降中的泛化效益，On the Generalization Benefit of Noise in Stochastic Gradient Descent

专知会员服务

19+阅读 · 2020年6月29日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

【新书】面向企业的图学习扩展：生产级图学习与推理，485页pdf

AI智能体编程：技术、挑战与机遇综述

【国家标准】数据安全技术数据安全风险评估方法

【CMU博士论文】交互式学习的进展：替代性反馈机制与自适应因果推理

相关资讯

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

动物脑的好奇心和强化学习的好奇心

动物脑的好奇心和强化学习的好奇心

CreateAMind

10+阅读 · 2019年1月26日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

【OpenAI】深度强化学习关键论文列表

【OpenAI】深度强化学习关键论文列表

专知

11+阅读 · 2018年11月10日

条件GAN重大改进！cGANs with Projection Discriminator

条件GAN重大改进！cGANs with Projection Discriminator

CreateAMind

8+阅读 · 2018年2月7日

carla 学习笔记

carla 学习笔记

CreateAMind

9+阅读 · 2018年2月7日

分布式TensorFlow入门指南

分布式TensorFlow入门指南

机器学习研究会

4+阅读 · 2017年11月28日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

【今日新增】IEEE Trans.专刊截稿信息8条

【今日新增】IEEE Trans.专刊截稿信息8条

Call4Papers

7+阅读 · 2017年6月29日

相关论文

AdaGDA: Faster Adaptive Gradient Descent Ascent Methods for Minimax Optimization

Arxiv

0+阅读 · 2021年9月14日

DSDF: An approach to handle stochastic agents in collaborative multi-agent reinforcement learning

Arxiv

0+阅读 · 2021年9月14日

Error analysis for 2D stochastic Navier--Stokes equations in bounded domains

Arxiv

0+阅读 · 2021年9月14日

Resource Optimization with Interference Coupling in Multi-IRS-assisted Multi-cell Systems

Arxiv

0+阅读 · 2021年9月13日

Runtime Analysis of Single- and Multi-Objective Evolutionary Algorithms for Chance Constrained Optimization Problems with Normally Distributed Random Variables

Arxiv

1+阅读 · 2021年9月13日

Estimates on the generalization error of Physics Informed Neural Networks (PINNs) for approximating PDEs

Arxiv

1+阅读 · 2021年9月10日

Two-derivative deferred correction time discretization for the discontinuous Galerkin method

Arxiv

0+阅读 · 2021年9月10日

A Dynamic Scheduling Policy for a Network with Heterogeneous Time-Sensitive Traffic

Arxiv

0+阅读 · 2021年9月10日

Robust Differentiable SVD

Arxiv

9+阅读 · 2021年4月8日

Bipedal Walking Robot using Deep Deterministic Policy Gradient

Bipedal Walking Robot using Deep Deterministic Policy Gradient

Arxiv

3+阅读 · 2018年7月16日

微信扫码咨询专知VIP会员