离线至在线强化学习适应性政策学习</s> (Adaptive Policy Learning for Offline-to-Online Reinforcement Learning) - 专知论文

会员服务 ·

0

Learning · 在线 · 数据集 · 强化学习 · Extensibility ·

2023 年 3 月 14 日

Adaptive Policy Learning for Offline-to-Online Reinforcement Learning

翻译：离线至在线强化学习适应性政策学习

Han Zheng,Xufang Luo,Pengfei Wei,Xuan Song,Dongsheng Li,Jing Jiang

from arxiv, AAAI2023

Conventional reinforcement learning (RL) needs an environment to collect fresh data, which is impractical when online interactions are costly. Offline RL provides an alternative solution by directly learning from the previously collected dataset. However, it will yield unsatisfactory performance if the quality of the offline datasets is poor. In this paper, we consider an offline-to-online setting where the agent is first learned from the offline dataset and then trained online, and propose a framework called Adaptive Policy Learning for effectively taking advantage of offline and online data. Specifically, we explicitly consider the difference between the online and offline data and apply an adaptive update scheme accordingly, that is, a pessimistic update strategy for the offline dataset and an optimistic/greedy update scheme for the online dataset. Such a simple and effective method provides a way to mix the offline and online RL and achieve the best of both worlds. We further provide two detailed algorithms for implementing the framework through embedding value or policy-based RL algorithms into it. Finally, we conduct extensive experiments on popular continuous control tasks, and results show that our algorithm can learn the expert policy with high sample efficiency even when the quality of offline dataset is poor, e.g., random dataset.

翻译：常规强化学习( RL) 需要一种收集新数据的环境, 而当在线互动费用昂贵时,这是不切实际的。离线RL 通过直接从先前收集的数据集中直接学习, 提供了一个替代解决方案。但是, 如果离线数据集的质量差, 则会产生不满意的性能。在本文中, 我们考虑一种离线到在线的设置, 代理首先从离线数据集中学习, 然后在网上培训, 并提出一个称为适应性政策学习的框架, 以便有效地利用离线和在线数据。具体地说, 我们明确考虑在线数据和离线数据之间的差异, 并相应应用适应性更新计划, 也就是说, 离线数据集的悲观性更新战略, 以及在线数据集的乐观/ 微调更新计划。这样简单有效的方法可以混合离线和在线的 RL, 并实现两个世界的最佳目标。我们还提供了两个详细的算法, 通过将价值或基于政策的 RL 算法嵌入框架的实施框架。最后, 我们对大众持续控制任务进行了广泛的实验, 并且结果显示我们的算算算算算算算算出, 当高数据时, 数据是随机的, 数据是随机性数据, 。</s>

0

相关内容

Learning

【2022新书】强化学习工业应用，408页pdf

【2022新书】强化学习工业应用，408页pdf

专知会员服务

231+阅读 · 2022年2月3日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Reinforcement Learning: An Introduction 2018第二版 500页

Reinforcement Learning: An Introduction 2018第二版 500页

CreateAMind

14+阅读 · 2018年4月27日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

44+阅读 · 2015年12月31日

MicroRNA调控BACE1在AD发病中的作用与机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

闭环双相磁电材料的尺寸效应研究

国家自然科学基金

0+阅读 · 2013年12月31日

贵金属—钙钛矿催化剂材料中的金属扩散和催化活性

国家自然科学基金

0+阅读 · 2013年12月31日

压电薄膜剪切压电应变特性的研究

国家自然科学基金

0+阅读 · 2013年12月31日

新型窄带隙锑化物二维电子气材料及输运特性研究

国家自然科学基金

0+阅读 · 2012年12月31日

反钙钛矿结构Mn3XN(X=Cu,Ni)薄膜的制备及其磁相变和电输运特性研究

国家自然科学基金

0+阅读 · 2012年12月31日

水滑石基可见光响应型复合光催化剂薄膜的研制及其光催化性能研究

国家自然科学基金

0+阅读 · 2012年12月31日

纳米结构金刚石基板上单晶二氧化钛薄膜的低温制备与特性研究

国家自然科学基金

0+阅读 · 2012年12月31日

SrxBa1-xNb2O6纳米陶瓷与薄膜的电卡效应

国家自然科学基金

0+阅读 · 2012年12月31日

A Survey on Offline Model-Based Reinforcement Learning

Arxiv

0+阅读 · 2023年5月5日

Federated Ensemble-Directed Offline Reinforcement Learning

Arxiv

0+阅读 · 2023年5月4日

Rethinking Population-assisted Off-policy Reinforcement Learning

Arxiv

0+阅读 · 2023年5月4日

Leveraging Factored Action Spaces for Efficient Offline Reinforcement Learning in Healthcare

Arxiv

0+阅读 · 2023年5月2日

A Survey of Meta-Reinforcement Learning

Arxiv

12+阅读 · 2023年1月19日

A Domain-Agnostic Approach for Characterization of Lifelong Learning Systems

Arxiv

17+阅读 · 2023年1月18日

Pretraining in Deep Reinforcement Learning: A Survey

Arxiv

21+阅读 · 2022年11月8日

A Survey on Reinforcement Learning for Recommender Systems

Arxiv

22+阅读 · 2021年9月22日

DeepPath: A Reinforcement Learning Method for Knowledge Graph Reasoning

Arxiv

20+阅读 · 2018年1月8日

Deep Reinforcement Learning for List-wise Recommendations

Arxiv

13+阅读 · 2018年1月5日

VIP会员

文章信息

相关主题

相关VIP内容

【2022新书】强化学习工业应用，408页pdf

【2022新书】强化学习工业应用，408页pdf

专知会员服务

231+阅读 · 2022年2月3日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【牛津博士论文】零样本强化学习综述

《美军条令：陆军指挥官与规划人员地理空间指南》60页

战术边缘指挥控制：防务面临的核心挑战

迈向开放世界检测：综述

相关资讯

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Reinforcement Learning: An Introduction 2018第二版 500页

Reinforcement Learning: An Introduction 2018第二版 500页

CreateAMind

14+阅读 · 2018年4月27日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

A Survey on Offline Model-Based Reinforcement Learning

Arxiv

0+阅读 · 2023年5月5日

Federated Ensemble-Directed Offline Reinforcement Learning

Arxiv

0+阅读 · 2023年5月4日

Rethinking Population-assisted Off-policy Reinforcement Learning

Arxiv

0+阅读 · 2023年5月4日

Leveraging Factored Action Spaces for Efficient Offline Reinforcement Learning in Healthcare

Arxiv

0+阅读 · 2023年5月2日

A Survey of Meta-Reinforcement Learning

Arxiv

12+阅读 · 2023年1月19日

A Domain-Agnostic Approach for Characterization of Lifelong Learning Systems

Arxiv

17+阅读 · 2023年1月18日

Pretraining in Deep Reinforcement Learning: A Survey

Arxiv

21+阅读 · 2022年11月8日

A Survey on Reinforcement Learning for Recommender Systems

Arxiv

22+阅读 · 2021年9月22日

DeepPath: A Reinforcement Learning Method for Knowledge Graph Reasoning

Arxiv

20+阅读 · 2018年1月8日

Deep Reinforcement Learning for List-wise Recommendations

Arxiv

13+阅读 · 2018年1月5日

相关基金

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

44+阅读 · 2015年12月31日

MicroRNA调控BACE1在AD发病中的作用与机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

闭环双相磁电材料的尺寸效应研究

国家自然科学基金

0+阅读 · 2013年12月31日

贵金属—钙钛矿催化剂材料中的金属扩散和催化活性

国家自然科学基金

0+阅读 · 2013年12月31日

压电薄膜剪切压电应变特性的研究

国家自然科学基金

0+阅读 · 2013年12月31日

新型窄带隙锑化物二维电子气材料及输运特性研究

国家自然科学基金

0+阅读 · 2012年12月31日

反钙钛矿结构Mn3XN(X=Cu,Ni)薄膜的制备及其磁相变和电输运特性研究

国家自然科学基金

0+阅读 · 2012年12月31日

水滑石基可见光响应型复合光催化剂薄膜的研制及其光催化性能研究

国家自然科学基金

0+阅读 · 2012年12月31日

纳米结构金刚石基板上单晶二氧化钛薄膜的低温制备与特性研究

国家自然科学基金

0+阅读 · 2012年12月31日

SrxBa1-xNb2O6纳米陶瓷与薄膜的电卡效应

国家自然科学基金

0+阅读 · 2012年12月31日

微信扫码咨询专知VIP会员