使用无模型的深RL模型模型模型模型的RL的适应性推出长度 (Adaptive Rollout Length for Model-Based RL Using Model-Free Deep RL) - 专知论文

会员服务 ·

0

Learning · INTERACT · 超参数 · 回合 · 强化学习 ·

2022 年 6 月 7 日

Adaptive Rollout Length for Model-Based RL Using Model-Free Deep RL

翻译：使用无模型的深RL模型模型模型模型的RL的适应性推出长度

Abhinav Bhatia,Philip S. Thomas,Shlomo Zilberstein

Model-based reinforcement learning promises to learn an optimal policy from fewer interactions with the environment compared to model-free reinforcement learning by learning an intermediate model of the environment in order to predict future interactions. When predicting a sequence of interactions, the rollout length, which limits the prediction horizon, is a critical hyperparameter as accuracy of the predictions diminishes in the regions that are further away from real experience. As a result, with a longer rollout length, an overall worse policy is learned in the long run. Thus, the hyperparameter provides a trade-off between quality and efficiency. In this work, we frame the problem of tuning the rollout length as a meta-level sequential decision-making problem that optimizes the final policy learned by model-based reinforcement learning given a fixed budget of environment interactions by adapting the hyperparameter dynamically based on feedback from the learning process, such as accuracy of the model and the remaining budget of interactions. We use model-free deep reinforcement learning to solve the meta-level decision problem and demonstrate that our approach outperforms common heuristic baselines on two well-known reinforcement learning environments.

翻译：以模型为基础的强化学习承诺从较少的环境互动中学习最佳政策,而通过学习一种中间环境模型来学习一种无模型的强化学习,以预测未来的互动。在预测一系列互动时,由于预测的准确性在距离实际经验更远的区域越来越低,因此,由于预测的准确性是一个至关重要的超参数。因此,由于推出时间较长,从长远来看,一项总体更差的政策已经学到。因此,超参数提供了质量和效率之间的权衡。在这项工作中,我们把调整推出时间的问题定义为一个元层次的顺序决策问题,根据基于模型的强化学习的固定预算,根据学习过程的反馈,如模型的准确性和其余的互动预算,优化通过基于基于模型的反馈,动态调整超光度环境互动的最后政策。我们用没有模型的深层强化学习来解决元级决策问题,并表明我们的方法在两个众所周知的强化学习环境中的超常的超值基线。

0

相关内容

Learning

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

【强化学习论文推荐集合】2019年必读的10篇TOP强化学习论文，My Top 10 Deep RL Papers of 2019

【强化学习论文推荐集合】2019年必读的10篇TOP强化学习论文，My Top 10 Deep RL Papers of 2019

专知会员服务

42+阅读 · 2020年1月15日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

LibRec 精选：推荐的可解释性[综述]

LibRec 精选：推荐的可解释性[综述]

LibRec智能推荐

10+阅读 · 2018年5月4日

【论文推荐】最新七篇强化学习相关论文—逻辑约束、综述、多任务深度强化学习、参数服务器、事件抽取、分层强化学习、过拟合研究

【论文推荐】最新七篇强化学习相关论文—逻辑约束、综述、多任务深度强化学习、参数服务器、事件抽取、分层强化学习、过拟合研究

专知

25+阅读 · 2018年4月29日

【推荐】YOLO实时目标检测(6fps)

【推荐】YOLO实时目标检测(6fps)

机器学习研究会

20+阅读 · 2017年11月5日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

静息态皮层-纹状体功能连接在抗精神病药物治疗应答中的作用研究

国家自然科学基金

0+阅读 · 2015年12月31日

SORL1基因影响认知老化的脑网络机制——基于功能磁共振成像的静息态默认网络研究

国家自然科学基金

1+阅读 · 2015年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

44+阅读 · 2015年12月31日

控释VEGF/NT-3脊髓脱细胞支架在SCI模型中的血管化及神经再生研究

国家自然科学基金

0+阅读 · 2013年12月31日

采用pinball loss的MEE算法研究

国家自然科学基金

1+阅读 · 2013年12月31日

miR-146a靶向IRAK1与TRAF6调控非小细胞肺癌转移的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

低气压对高海拔湖泊甲烷气泡排放的驱动力及机理

国家自然科学基金

0+阅读 · 2012年12月31日

ICF中电子/离子输运的PIC-FLUID混合模拟方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

渗流-应力耦合作用下深部岩石工程围岩的时效变形与细观失稳机理研究

国家自然科学基金

0+阅读 · 2011年12月31日

矩阵分解的低延迟并行算法

国家自然科学基金

0+阅读 · 2009年12月31日

Sample-dependent Adaptive Temperature Scaling for Improved Calibration

Arxiv

0+阅读 · 2022年7月22日

Reinforcement Learning Approaches for the Orienteering Problem with Stochastic and Dynamic Release Dates

Arxiv

0+阅读 · 2022年7月22日

Addressing Optimism Bias in Sequence Modeling for Reinforcement Learning

Arxiv

0+阅读 · 2022年7月21日

Differentially Private Partial Set Cover with Applications to Facility Location

Arxiv

0+阅读 · 2022年7月21日

Error-in-variables modelling for operator learning

Arxiv

0+阅读 · 2022年7月19日

Temporal Difference Learning for Model Predictive Control

Arxiv

0+阅读 · 2022年7月19日

Transformers are Meta-Reinforcement Learners

Arxiv

15+阅读 · 2022年6月14日

A Survey of Quantization Methods for Efficient Neural Network Inference

Arxiv

22+阅读 · 2021年6月21日

Pre-training Text Representations as Meta Learning

Arxiv

13+阅读 · 2020年4月12日

Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey

Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey

Arxiv

20+阅读 · 2020年3月10日

VIP会员

文章信息

相关主题

相关VIP内容

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

【强化学习论文推荐集合】2019年必读的10篇TOP强化学习论文，My Top 10 Deep RL Papers of 2019

【强化学习论文推荐集合】2019年必读的10篇TOP强化学习论文，My Top 10 Deep RL Papers of 2019

专知会员服务

42+阅读 · 2020年1月15日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

模型提取攻击与防御的系统综述：最新进展与展望

【博士论文】低维与高维空间中潜在表征的分析、建模与变换

【CMU博士论文】用于物理模拟的高效深度学习模型

大模型解决方案白皮书：社交陪伴场景全流程落地指南

相关资讯

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

LibRec 精选：推荐的可解释性[综述]

LibRec 精选：推荐的可解释性[综述]

LibRec智能推荐

10+阅读 · 2018年5月4日

【论文推荐】最新七篇强化学习相关论文—逻辑约束、综述、多任务深度强化学习、参数服务器、事件抽取、分层强化学习、过拟合研究

【论文推荐】最新七篇强化学习相关论文—逻辑约束、综述、多任务深度强化学习、参数服务器、事件抽取、分层强化学习、过拟合研究

专知

25+阅读 · 2018年4月29日

【推荐】YOLO实时目标检测(6fps)

【推荐】YOLO实时目标检测(6fps)

机器学习研究会

20+阅读 · 2017年11月5日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Sample-dependent Adaptive Temperature Scaling for Improved Calibration

Arxiv

0+阅读 · 2022年7月22日

Reinforcement Learning Approaches for the Orienteering Problem with Stochastic and Dynamic Release Dates

Arxiv

0+阅读 · 2022年7月22日

Addressing Optimism Bias in Sequence Modeling for Reinforcement Learning

Arxiv

0+阅读 · 2022年7月21日

Differentially Private Partial Set Cover with Applications to Facility Location

Arxiv

0+阅读 · 2022年7月21日

Error-in-variables modelling for operator learning

Arxiv

0+阅读 · 2022年7月19日

Temporal Difference Learning for Model Predictive Control

Arxiv

0+阅读 · 2022年7月19日

Transformers are Meta-Reinforcement Learners

Arxiv

15+阅读 · 2022年6月14日

A Survey of Quantization Methods for Efficient Neural Network Inference

Arxiv

22+阅读 · 2021年6月21日

Pre-training Text Representations as Meta Learning

Arxiv

13+阅读 · 2020年4月12日

Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey

Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey

Arxiv

20+阅读 · 2020年3月10日

相关基金

静息态皮层-纹状体功能连接在抗精神病药物治疗应答中的作用研究

国家自然科学基金

0+阅读 · 2015年12月31日

SORL1基因影响认知老化的脑网络机制——基于功能磁共振成像的静息态默认网络研究

国家自然科学基金

1+阅读 · 2015年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

44+阅读 · 2015年12月31日

控释VEGF/NT-3脊髓脱细胞支架在SCI模型中的血管化及神经再生研究

国家自然科学基金

0+阅读 · 2013年12月31日

采用pinball loss的MEE算法研究

国家自然科学基金

1+阅读 · 2013年12月31日

miR-146a靶向IRAK1与TRAF6调控非小细胞肺癌转移的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

低气压对高海拔湖泊甲烷气泡排放的驱动力及机理

国家自然科学基金

0+阅读 · 2012年12月31日

ICF中电子/离子输运的PIC-FLUID混合模拟方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

渗流-应力耦合作用下深部岩石工程围岩的时效变形与细观失稳机理研究

国家自然科学基金

0+阅读 · 2011年12月31日

矩阵分解的低延迟并行算法

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员