采用 " 眼观政策 " 的在线规划 (Online Planning with Lookahead Policies) - 专知论文

会员服务 ·

0

近似 · 贪心 · Performer · 状态空间 · 样本复杂度 ·

2020 年 10 月 12 日

Online Planning with Lookahead Policies

翻译：采用 " 眼观政策 " 的在线规划

Yonathan Efroni,Mohammad Ghavamzadeh,Shie Mannor

from arxiv, NeurIPS 2020

Real Time Dynamic Programming (RTDP) is an online algorithm based on Dynamic Programming (DP) that acts by 1-step greedy planning. Unlike DP, RTDP does not require access to the entire state space, i.e., it explicitly handles the exploration. This fact makes RTDP particularly appealing when the state space is large and it is not possible to update all states simultaneously. In this we devise a multi-step greedy RTDP algorithm, which we call $h$-RTDP, that replaces the 1-step greedy policy with a $h$-step lookahead policy. We analyze $h$-RTDP in its exact form and establish that increasing the lookahead horizon, $h$, results in an improved sample complexity, with the cost of additional computations. This is the first work that proves improved sample complexity as a result of {\em increasing} the lookahead horizon in online planning. We then analyze the performance of $h$-RTDP in three approximate settings: approximate model, approximate value updates, and approximate state representation. For these cases, we prove that the asymptotic performance of $h$-RTDP remains the same as that of a corresponding approximate DP algorithm, the best one can hope for without further assumptions on the approximation errors.

翻译：实时动态编程(RTDP)是一种基于动态编程(DP)的在线算法,它以1步贪婪计划(DP)为基础。与DP不同的是,RTDP并不要求进入整个州空间,也就是说,它明确处理勘探。这一事实使得RTDP在州空间巨大且不可能同时更新所有各州时特别吸引。在此过程中,我们设计了一个多步贪婪的RTDP算法,我们称之为$h$-RTDP,以1步贪婪政策取代1步贪婪政策,以1美元步式的外观政策。我们以准确的形式分析$-RTDP,并确定增加外观视野($h)的结果是改进了样本复杂性,并增加了计算费用。这是第一个证明由于在线规划中外观视野的增加而使样本复杂性得到改善的工作。我们然后将美元-RTDP的绩效分析在三种大致情况下:大致模式、近似值更新和近似州代表制。对于这些情况,我们证明,美元-RTDP的模拟性性性性表现可以使DP的精确度更接近于希望。

0

相关内容

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

超越深度学习：梯度提升机Gradient Boosting Machines (GBM)，73页ppt

超越深度学习：梯度提升机Gradient Boosting Machines (GBM)，73页ppt

专知会员服务

51+阅读 · 2020年6月21日

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

专知会员服务

84+阅读 · 2020年2月18日

康奈尔大学Jon Kleinberg经典书《算法设计Algorithm Design》课件PPT与电子书，864页pdf

康奈尔大学Jon Kleinberg经典书《算法设计Algorithm Design》课件PPT与电子书，864页pdf

专知会员服务

235+阅读 · 2020年1月21日

【AdaMod】一个新的深度学习优化与记忆（Meet AdaMod: a new deep learning optimizer with memory）

【AdaMod】一个新的深度学习优化与记忆（Meet AdaMod: a new deep learning optimizer with memory）

专知会员服务

15+阅读 · 2020年1月13日

【Google DeepMind & 斯坦福 AAAI2020】Options of Interest Temporal Abstraction with Interest Function

【Google DeepMind & 斯坦福 AAAI2020】Options of Interest Temporal Abstraction with Interest Function

专知会员服务

5+阅读 · 2020年1月5日

【ECML-PKDD 2019】基于bagged-trees学习的可解释生存梯度提升模型（Interpretable survival gradient boosting models with bagged trees base learners）

【ECML-PKDD 2019】基于bagged-trees学习的可解释生存梯度提升模型（Interpretable survival gradient boosting models with bagged trees base learners）

专知会员服务

6+阅读 · 2019年12月1日

实时强化学习《Real-Time Reinforcement Learning》S Ramstedt, C Pal [Mila, Element AI] (2019)

实时强化学习《Real-Time Reinforcement Learning》S Ramstedt, C Pal [Mila, Element AI] (2019)

专知会员服务

13+阅读 · 2019年11月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

已删除

将门创投

3+阅读 · 2020年8月3日

最强深度学习优化器Ranger开源：RAdam+LookAhead强强结合，性能更优速度更快

最强深度学习优化器Ranger开源：RAdam+LookAhead强强结合，性能更优速度更快

AI前线

7+阅读 · 2019年9月7日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Verifiable Planning in Expected Reward Multichain MDPs

Verifiable Planning in Expected Reward Multichain MDPs

Arxiv

0+阅读 · 2020年12月3日

Stochastic Motion Planning under Partial Observability for Mobile Robots with Continuous Range Measurements

Arxiv

0+阅读 · 2020年12月2日

Efficient Model-Based Reinforcement Learning through Optimistic Policy Search and Planning

Arxiv

0+阅读 · 2020年12月1日

Learning Sampling Distributions for Efficient High-Dimensional Motion Planning

Learning Sampling Distributions for Efficient High-Dimensional Motion Planning

Arxiv

0+阅读 · 2020年12月1日

Near-Optimal Provable Uniform Convergence in Offline Policy Evaluation for Reinforcement Learning

Arxiv

0+阅读 · 2020年12月1日

Attention-Based Planning with Active Perception

Arxiv

0+阅读 · 2020年11月30日

Doubly Stochastic Subspace Clustering

Arxiv

0+阅读 · 2020年11月30日

Task Planning with a Weighted Functional Object-Oriented Network

Arxiv

0+阅读 · 2020年11月28日

Predictive Collision Management for Time and Risk Dependent Path Planning

Arxiv

0+阅读 · 2020年11月26日

Reactive motion planning with probabilistic safety guarantees

Arxiv

0+阅读 · 2020年11月26日

VIP会员

文章信息

相关主题

样本复杂度

相关VIP内容

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

超越深度学习：梯度提升机Gradient Boosting Machines (GBM)，73页ppt

超越深度学习：梯度提升机Gradient Boosting Machines (GBM)，73页ppt

专知会员服务

51+阅读 · 2020年6月21日

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

专知会员服务

84+阅读 · 2020年2月18日

康奈尔大学Jon Kleinberg经典书《算法设计Algorithm Design》课件PPT与电子书，864页pdf

康奈尔大学Jon Kleinberg经典书《算法设计Algorithm Design》课件PPT与电子书，864页pdf

专知会员服务

235+阅读 · 2020年1月21日

【AdaMod】一个新的深度学习优化与记忆（Meet AdaMod: a new deep learning optimizer with memory）

【AdaMod】一个新的深度学习优化与记忆（Meet AdaMod: a new deep learning optimizer with memory）

专知会员服务

15+阅读 · 2020年1月13日

【Google DeepMind & 斯坦福 AAAI2020】Options of Interest Temporal Abstraction with Interest Function

【Google DeepMind & 斯坦福 AAAI2020】Options of Interest Temporal Abstraction with Interest Function

专知会员服务

5+阅读 · 2020年1月5日

【ECML-PKDD 2019】基于bagged-trees学习的可解释生存梯度提升模型（Interpretable survival gradient boosting models with bagged trees base learners）

【ECML-PKDD 2019】基于bagged-trees学习的可解释生存梯度提升模型（Interpretable survival gradient boosting models with bagged trees base learners）

专知会员服务

6+阅读 · 2019年12月1日

实时强化学习《Real-Time Reinforcement Learning》S Ramstedt, C Pal [Mila, Element AI] (2019)

实时强化学习《Real-Time Reinforcement Learning》S Ramstedt, C Pal [Mila, Element AI] (2019)

专知会员服务

13+阅读 · 2019年11月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

126页ppt《AI应用（AI Agent）开发新范式》！

基于深度神经网络的视频分析中的效率优化技术综述：处理系统、算法与应用

WWW2025 | KAG：一种大模型知识增强生成框架

用于时间序列预测的扩散模型：综述

相关资讯

已删除

将门创投

3+阅读 · 2020年8月3日

最强深度学习优化器Ranger开源：RAdam+LookAhead强强结合，性能更优速度更快

最强深度学习优化器Ranger开源：RAdam+LookAhead强强结合，性能更优速度更快

AI前线

7+阅读 · 2019年9月7日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

相关论文

Verifiable Planning in Expected Reward Multichain MDPs

Verifiable Planning in Expected Reward Multichain MDPs

Arxiv

0+阅读 · 2020年12月3日

Stochastic Motion Planning under Partial Observability for Mobile Robots with Continuous Range Measurements

Arxiv

0+阅读 · 2020年12月2日

Efficient Model-Based Reinforcement Learning through Optimistic Policy Search and Planning

Arxiv

0+阅读 · 2020年12月1日

Learning Sampling Distributions for Efficient High-Dimensional Motion Planning

Learning Sampling Distributions for Efficient High-Dimensional Motion Planning

Arxiv

0+阅读 · 2020年12月1日

Near-Optimal Provable Uniform Convergence in Offline Policy Evaluation for Reinforcement Learning

Arxiv

0+阅读 · 2020年12月1日

Attention-Based Planning with Active Perception

Arxiv

0+阅读 · 2020年11月30日

Doubly Stochastic Subspace Clustering

Arxiv

0+阅读 · 2020年11月30日

Task Planning with a Weighted Functional Object-Oriented Network

Arxiv

0+阅读 · 2020年11月28日

Predictive Collision Management for Time and Risk Dependent Path Planning

Arxiv

0+阅读 · 2020年11月26日

Reactive motion planning with probabilistic safety guarantees

Arxiv

0+阅读 · 2020年11月26日

微信扫码咨询专知VIP会员