以线性可实现最佳行动价值函数来规划 MDPs MDPs (Exponential Lower Bounds for Planning in MDPs With Linearly-Realizable Optimal Action-Value Functions) - 专知论文

会员服务 ·

0

优化器 · 泛函 · 情景 · 值迭代 · 张成子空间 ·

2021 年 2 月 15 日

Exponential Lower Bounds for Planning in MDPs With Linearly-Realizable Optimal Action-Value Functions

翻译：以线性可实现最佳行动价值函数来规划 MDPs MDPs

Gellért Weisz,Philip Amortila,Csaba Szepesvári

We consider the problem of local planning in fixed-horizon and discounted Markov Decision Processes (MDPs) with linear function approximation and a generative model under the assumption that the optimal action-value function lies in the span of a feature map that is available to the planner. Previous work has left open the question of whether there exist sound planners that need only poly(H,d) queries regardless of the MDP, where H is the horizon and d is the dimensionality of the features. We answer this question in the negative: we show that any sound planner must query at least $\min(\exp({\Omega}(d)), {\Omega}(2^H))$ samples in the fized-horizon setting and $\exp({\Omega}(d))$ samples in the discounted setting. We also show that for any ${\delta}>0$, the least-squares value iteration algorithm with $O(H^5d^{H+1}/{\delta}^2)$ queries can compute a ${\delta}$-optimal policy in the fixed-horizon setting. We discuss implications and remaining open questions.

翻译：我们考虑了固定正数和折扣的Markov 决策程序(MDPs)的本地规划问题,该程序具有线性功能近似值和基因模型,假设最佳行动价值功能存在于可供计划者使用的地貌图范围内。以前的工作没有解决一个问题,即是否存在只需要多(H,d)查询的健全的规划者,而不论MDP,H是地平线,d是特征的维度。我们否定地回答这个问题:我们显示,任何健全的规划者必须至少查询美元(exmo(femega}(d))) 、 emega}(2(2) ) 美元,在Fizd-horizon 设置和 $\ exp(@Omega} (d) ) 在折扣环境中是否只有多(H,d) 查询,而H是地平线和 d是特征的维度。我们还表明,对于任何$(delta)0美元,最差值的 Iteration 算法值为$(H5d_H+1}/ delta ⁇ 2) 查询中至少可以计算出$xelta$-homon-imp-imprizet polist-resplest polist poli-s imp impesution impesution impesution impetions。

1

相关内容

优化器

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

策略梯度方法的算子视图，An operator view of policy gradient methods

策略梯度方法的算子视图，An operator view of policy gradient methods

专知会员服务

11+阅读 · 2020年6月23日

【Manning新书】现代Java实战，592页pdf

【Manning新书】现代Java实战，592页pdf

专知会员服务

101+阅读 · 2020年5月22日

Python计算导论，560页pdf，Introduction to Computing Using Python

Python计算导论，560页pdf，Introduction to Computing Using Python

专知会员服务

76+阅读 · 2020年5月5日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【伯克利，基于模型的强化学习：理论与实践】《Model-Based Reinforcement Learning:Theory and Practice》，Michael Janner

【伯克利，基于模型的强化学习：理论与实践】《Model-Based Reinforcement Learning:Theory and Practice》，Michael Janner

专知会员服务

35+阅读 · 2019年12月12日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

244+阅读 · 2019年10月21日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

已删除

将门创投

6+阅读 · 2017年7月6日

Optimal Approximation Rate of ReLU Networks in terms of Width and Depth

Arxiv

0+阅读 · 2021年4月9日

On the minimum spanning tree problem in imprecise set-up

Arxiv

0+阅读 · 2021年4月9日

Remote State Estimation of Multiple Systems over Multiple Markov Fading Channels

Arxiv

0+阅读 · 2021年4月9日

Deterministic Scheduling of Periodic Messages for Low Latency in Cloud RAN

Arxiv

0+阅读 · 2021年4月8日

Central Moment Analysis for Cost Accumulators in Probabilistic Programs

Arxiv

0+阅读 · 2021年4月8日

Accelerated Methods for Deep Reinforcement Learning

Accelerated Methods for Deep Reinforcement Learning

Arxiv

6+阅读 · 2019年1月10日

Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks

Arxiv

8+阅读 · 2018年11月21日

Variational Bayesian Reinforcement Learning with Regret Bounds

Arxiv

3+阅读 · 2018年7月25日

Variance Reduction Methods for Sublinear Reinforcement Learning

Arxiv

4+阅读 · 2018年4月25日

Variance-based regularization with convex objectives

Arxiv

5+阅读 · 2017年12月14日

VIP会员

文章信息

相关主题

张成子空间

相关VIP内容

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

策略梯度方法的算子视图，An operator view of policy gradient methods

策略梯度方法的算子视图，An operator view of policy gradient methods

专知会员服务

11+阅读 · 2020年6月23日

【Manning新书】现代Java实战，592页pdf

【Manning新书】现代Java实战，592页pdf

专知会员服务

101+阅读 · 2020年5月22日

Python计算导论，560页pdf，Introduction to Computing Using Python

Python计算导论，560页pdf，Introduction to Computing Using Python

专知会员服务

76+阅读 · 2020年5月5日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【伯克利，基于模型的强化学习：理论与实践】《Model-Based Reinforcement Learning:Theory and Practice》，Michael Janner

【伯克利，基于模型的强化学习：理论与实践】《Model-Based Reinforcement Learning:Theory and Practice》，Michael Janner

专知会员服务

35+阅读 · 2019年12月12日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

244+阅读 · 2019年10月21日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

《乌克兰无人机产业：志愿者与政策在构建新兴无人机产业中的协同作用》最新报告

《人工智能辅助决策中的数据可视化：系统性综述》

人工智能驱动弹药制造现代化：美国陆军转型之路

《敏捷作战部署中枢纽-辐条基地选址优化研究》80页

相关资讯

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

已删除

将门创投

6+阅读 · 2017年7月6日

相关论文

Optimal Approximation Rate of ReLU Networks in terms of Width and Depth

Arxiv

0+阅读 · 2021年4月9日

On the minimum spanning tree problem in imprecise set-up

Arxiv

0+阅读 · 2021年4月9日

Remote State Estimation of Multiple Systems over Multiple Markov Fading Channels

Arxiv

0+阅读 · 2021年4月9日

Deterministic Scheduling of Periodic Messages for Low Latency in Cloud RAN

Arxiv

0+阅读 · 2021年4月8日

Central Moment Analysis for Cost Accumulators in Probabilistic Programs

Arxiv

0+阅读 · 2021年4月8日

Accelerated Methods for Deep Reinforcement Learning

Accelerated Methods for Deep Reinforcement Learning

Arxiv

6+阅读 · 2019年1月10日

Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks

Arxiv

8+阅读 · 2018年11月21日

Variational Bayesian Reinforcement Learning with Regret Bounds

Arxiv

3+阅读 · 2018年7月25日

Variance Reduction Methods for Sublinear Reinforcement Learning

Arxiv

4+阅读 · 2018年4月25日

Variance-based regularization with convex objectives

Arxiv

5+阅读 · 2017年12月14日

微信扫码咨询专知VIP会员