平均获奖 MDP 的复杂度 (Towards Tight Bounds on the Sample Complexity of Average-reward MDPs) - 专知论文

会员服务 ·

0

样本复杂度 · 状态转移矩阵 · 混合时间 · 混合 · 样本 ·

2021 年 6 月 13 日

Towards Tight Bounds on the Sample Complexity of Average-reward MDPs

翻译：平均获奖 MDP 的复杂度

Yujia Jin,Aaron Sidford

We prove new upper and lower bounds for sample complexity of finding an $\epsilon$-optimal policy of an infinite-horizon average-reward Markov decision process (MDP) given access to a generative model. When the mixing time of the probability transition matrix of all policies is at most $t_\mathrm{mix}$, we provide an algorithm that solves the problem using $\widetilde{O}(t_\mathrm{mix} \epsilon^{-3})$ (oblivious) samples per state-action pair. Further, we provide a lower bound showing that a linear dependence on $t_\mathrm{mix}$ is necessary in the worst case for any algorithm which computes oblivious samples. We obtain our results by establishing connections between infinite-horizon average-reward MDPs and discounted MDPs of possible further utility.

翻译：我们证明,在获得基因模型的情况下,在找到一个无限和偏差平均回报马可夫决定程序(MDP)的 $- epsilon $- optal- reward Markov 决策程序(MDP) 的样本复杂性方面,我们提供了新的和新的下限。当所有政策概率转换矩阵的混合时间最多为$t ⁇ mathrm{mix}$时,我们提供了一种算法来解决问题,它使用每对州行动样本的$- ipslon{m}\ epsilon {-3} (oblicous) $ (oblicous) 样本。此外,我们提供了一种较低的下限法则显示,在最坏的情况下,任何计算模糊的样本的算法都需要对 $$t ⁇ mathrm{mix} 进行线性依赖。我们通过建立无限和平均偏差的MDP之间的连接来获得我们的结果。

0

相关内容

样本复杂度

样本复杂度

【ICML2021】异质风险最小化，Heterogeneous Risk Minimization

专知会员服务

16+阅读 · 2021年5月21日

策略梯度方法的算子视图，An operator view of policy gradient methods

策略梯度方法的算子视图，An operator view of policy gradient methods

专知会员服务

11+阅读 · 2020年6月23日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

111+阅读 · 2020年5月15日

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

专知会员服务

131+阅读 · 2020年4月19日

【伯克利，基于模型的强化学习：理论与实践】《Model-Based Reinforcement Learning:Theory and Practice》，Michael Janner

【伯克利，基于模型的强化学习：理论与实践】《Model-Based Reinforcement Learning:Theory and Practice》，Michael Janner

专知会员服务

35+阅读 · 2019年12月12日

在线变分推断，76页ppt，A Regret Bound for Online Variational Inference

在线变分推断，76页ppt，A Regret Bound for Online Variational Inference

专知会员服务

21+阅读 · 2019年12月2日

【O'Reilly AI Conference 2019】高管简报:展望在线定价和算法主导的共谋的未来（Executive Briefing: A look at the future of online pricing and algorithm-led collusion），Rebecca Gu (Electron), Cris Lowery (Baringa Partners)

【O'Reilly AI Conference 2019】高管简报:展望在线定价和算法主导的共谋的未来（Executive Briefing: A look at the future of online pricing and algorithm-led collusion），Rebecca Gu (Electron), Cris Lowery (Baringa Partners)

专知会员服务

7+阅读 · 2019年11月5日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

动物脑的好奇心和强化学习的好奇心

动物脑的好奇心和强化学习的好奇心

CreateAMind

10+阅读 · 2019年1月26日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

Mean-Field Controls with Q-learning for Cooperative MARL: Convergence and Complexity Analysis

Arxiv

0+阅读 · 2021年8月12日

Gap-Dependent Unsupervised Exploration for Reinforcement Learning

Arxiv

0+阅读 · 2021年8月11日

Linear Bounds for Cycle-free Saturation Games

Arxiv

0+阅读 · 2021年8月11日

Convergence bounds for nonlinear least squares and applications to tensor recovery

Arxiv

0+阅读 · 2021年8月11日

Optimal learning of quantum Hamiltonians from high-temperature Gibbs states

Arxiv

0+阅读 · 2021年8月10日

The Minimax Estimator of the Average Treatment Effect, among Linear Combinations of Conditional Average Treatment Effects Estimators

Arxiv

0+阅读 · 2021年8月10日

Asymptotic convergence rates for averaging strategies

Arxiv

0+阅读 · 2021年8月10日

Fast and Fair Lock-Free Locks

Arxiv

0+阅读 · 2021年8月10日

Variational Bayesian Reinforcement Learning with Regret Bounds

Arxiv

3+阅读 · 2018年7月25日

Variance Reduction Methods for Sublinear Reinforcement Learning

Arxiv

4+阅读 · 2018年4月25日

VIP会员

文章信息

相关主题

样本复杂度

状态转移矩阵

相关VIP内容

【ICML2021】异质风险最小化，Heterogeneous Risk Minimization

专知会员服务

16+阅读 · 2021年5月21日

策略梯度方法的算子视图，An operator view of policy gradient methods

策略梯度方法的算子视图，An operator view of policy gradient methods

专知会员服务

11+阅读 · 2020年6月23日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

111+阅读 · 2020年5月15日

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

专知会员服务

131+阅读 · 2020年4月19日

【伯克利，基于模型的强化学习：理论与实践】《Model-Based Reinforcement Learning:Theory and Practice》，Michael Janner

【伯克利，基于模型的强化学习：理论与实践】《Model-Based Reinforcement Learning:Theory and Practice》，Michael Janner

专知会员服务

35+阅读 · 2019年12月12日

在线变分推断，76页ppt，A Regret Bound for Online Variational Inference

在线变分推断，76页ppt，A Regret Bound for Online Variational Inference

专知会员服务

21+阅读 · 2019年12月2日

【O'Reilly AI Conference 2019】高管简报:展望在线定价和算法主导的共谋的未来（Executive Briefing: A look at the future of online pricing and algorithm-led collusion），Rebecca Gu (Electron), Cris Lowery (Baringa Partners)

【O'Reilly AI Conference 2019】高管简报:展望在线定价和算法主导的共谋的未来（Executive Briefing: A look at the future of online pricing and algorithm-led collusion），Rebecca Gu (Electron), Cris Lowery (Baringa Partners)

专知会员服务

7+阅读 · 2019年11月5日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

《美陆军特种作战条令》最新102页

《洛克希德SR-71“黑鸟”侦察机动力系统》21页slides

美空军作战实验室通过人工智能和指挥控制技术创新推进杀伤链

《指挥控制能力分析方法论》最新报告

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

动物脑的好奇心和强化学习的好奇心

动物脑的好奇心和强化学习的好奇心

CreateAMind

10+阅读 · 2019年1月26日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Mean-Field Controls with Q-learning for Cooperative MARL: Convergence and Complexity Analysis

Arxiv

0+阅读 · 2021年8月12日

Gap-Dependent Unsupervised Exploration for Reinforcement Learning

Arxiv

0+阅读 · 2021年8月11日

Linear Bounds for Cycle-free Saturation Games

Arxiv

0+阅读 · 2021年8月11日

Convergence bounds for nonlinear least squares and applications to tensor recovery

Arxiv

0+阅读 · 2021年8月11日

Optimal learning of quantum Hamiltonians from high-temperature Gibbs states

Arxiv

0+阅读 · 2021年8月10日

The Minimax Estimator of the Average Treatment Effect, among Linear Combinations of Conditional Average Treatment Effects Estimators

Arxiv

0+阅读 · 2021年8月10日

Asymptotic convergence rates for averaging strategies

Arxiv

0+阅读 · 2021年8月10日

Fast and Fair Lock-Free Locks

Arxiv

0+阅读 · 2021年8月10日

Variational Bayesian Reinforcement Learning with Regret Bounds

Arxiv

3+阅读 · 2018年7月25日

Variance Reduction Methods for Sublinear Reinforcement Learning

Arxiv

4+阅读 · 2018年4月25日

微信扫码咨询专知VIP会员