Offline Prioritized Experience Replay - 专知论文

会员服务 ·

0

经验回放 · Learning · Weight · 泛函 · 操作 ·

2023 年 6 月 8 日

Offline Prioritized Experience Replay

翻译：暂无翻译

Yang Yue,Bingyi Kang,Xiao Ma,Gao Huang,Shiji Song,Shuicheng Yan

from arxiv, preprint

Offline reinforcement learning (RL) is challenged by the distributional shift problem. To address this problem, existing works mainly focus on designing sophisticated policy constraints between the learned policy and the behavior policy. However, these constraints are applied equally to well-performing and inferior actions through uniform sampling, which might negatively affect the learned policy. To alleviate this issue, we propose Offline Prioritized Experience Replay (OPER), featuring a class of priority functions designed to prioritize highly-rewarding transitions, making them more frequently visited during training. Through theoretical analysis, we show that this class of priority functions induce an improved behavior policy, and when constrained to this improved policy, a policy-constrained offline RL algorithm is likely to yield a better solution. We develop two practical strategies to obtain priority weights by estimating advantages based on a fitted value network (OPER-A) or utilizing trajectory returns (OPER-R) for quick computation. OPER is a plug-and-play component for offline RL algorithms. As case studies, we evaluate OPER on five different algorithms, including BC, TD3+BC, Onestep RL, CQL, and IQL. Extensive experiments demonstrate that both OPER-A and OPER-R significantly improve the performance for all baseline methods. Codes and priority weights are availiable at https://github.com/sail-sg/OPER.

翻译：暂无翻译

0

相关内容

经验回放

宾夕法尼亚大学最新《不确定性估计》课程笔记，134页pdf，附Slides

宾夕法尼亚大学最新《不确定性估计》课程笔记，134页pdf，附Slides

专知会员服务

49+阅读 · 2022年11月13日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

BERT到底如何work的？A Primer in BERTology: What we know about how BERT works

BERT到底如何work的？A Primer in BERTology: What we know about how BERT works

专知会员服务

50+阅读 · 2020年2月28日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

基于多模态影像学研究Aβ沉积对AD神经功能网络连接的影响机制

国家自然科学基金

0+阅读 · 2014年12月31日

GI介导干旱胁迫响应和干旱逃逸的分子机理

国家自然科学基金

0+阅读 · 2014年12月31日

有机-无机三缺位杂多钼氧簇稀土金属衍生物光电材料制备及性能研究

国家自然科学基金

0+阅读 · 2014年12月31日

典型木本植物叶片与大气汞的交换通量及其影响因素研究

国家自然科学基金

0+阅读 · 2013年12月31日

(Ti,Ta)C涂层原位转化制备中空多孔Ta2O5/TiO2异质结及其光催化性能

国家自然科学基金

0+阅读 · 2013年12月31日

实时安全关键系统的建模、仿真与验证

国家自然科学基金

1+阅读 · 2012年12月31日

基于ForCES的软件定义网络（SDN）研究

国家自然科学基金

1+阅读 · 2012年12月31日

宽光谱多金属氧酸盐分子光电器件的设计与组装

国家自然科学基金

0+阅读 · 2011年12月31日

宽带放大器用Er3+/Ce3+共掺碲酸盐玻璃及光纤1.53μm波段辐射强度提高研究

国家自然科学基金

0+阅读 · 2011年12月31日

OECs-NT-3基因工程细胞联合NSCs自体移植对自身免疫性脑脊髓炎突触信号传导及髓鞘修复的影响

国家自然科学基金

0+阅读 · 2011年12月31日

One-Shot Federated Conformal Prediction

Arxiv

0+阅读 · 2023年7月31日

DRL4Route: A Deep Reinforcement Learning Framework for Pick-up and Delivery Route Prediction

Arxiv

0+阅读 · 2023年7月30日

ESP: Exploiting Symmetry Prior for Multi-Agent Reinforcement Learning

Arxiv

0+阅读 · 2023年7月30日

Offline Decentralized Multi-Agent Reinforcement Learning

Arxiv

0+阅读 · 2023年7月29日

ETHER: Aligning Emergent Communication for Hindsight Experience Replay

Arxiv

0+阅读 · 2023年7月28日

Dynamic Feature-based Deep Reinforcement Learning for Flow Control of Circular Cylinder with Sparse Surface Pressure Sensing

Arxiv

0+阅读 · 2023年7月28日

Post-Episodic Reinforcement Learning Inference

Arxiv

0+阅读 · 2023年7月28日

Learning Compliant Stiffness by Impedance Control-Aware Task Segmentation and Multi-objective Bayesian Optimization with Priors

Arxiv

0+阅读 · 2023年7月28日

Thinker: Learning to Plan and Act

Arxiv

1+阅读 · 2023年7月27日

Dynamic neighbourhood optimisation for task allocation using multi-agent

Arxiv

101+阅读 · 2022年5月11日

VIP会员

文章信息

相关主题

相关VIP内容

宾夕法尼亚大学最新《不确定性估计》课程笔记，134页pdf，附Slides

宾夕法尼亚大学最新《不确定性估计》课程笔记，134页pdf，附Slides

专知会员服务

49+阅读 · 2022年11月13日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

BERT到底如何work的？A Primer in BERTology: What we know about how BERT works

BERT到底如何work的？A Primer in BERTology: What we know about how BERT works

专知会员服务

50+阅读 · 2020年2月28日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

迈向边缘通用智能：面向移动智能体 AI 的知识蒸馏

《第四代潜艇综合战斗管理系统（CMS）》

AI智能体驱动产业变革研究报告

俄罗斯无人水面艇

相关资讯

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

One-Shot Federated Conformal Prediction

Arxiv

0+阅读 · 2023年7月31日

DRL4Route: A Deep Reinforcement Learning Framework for Pick-up and Delivery Route Prediction

Arxiv

0+阅读 · 2023年7月30日

ESP: Exploiting Symmetry Prior for Multi-Agent Reinforcement Learning

Arxiv

0+阅读 · 2023年7月30日

Offline Decentralized Multi-Agent Reinforcement Learning

Arxiv

0+阅读 · 2023年7月29日

ETHER: Aligning Emergent Communication for Hindsight Experience Replay

Arxiv

0+阅读 · 2023年7月28日

Dynamic Feature-based Deep Reinforcement Learning for Flow Control of Circular Cylinder with Sparse Surface Pressure Sensing

Arxiv

0+阅读 · 2023年7月28日

Post-Episodic Reinforcement Learning Inference

Arxiv

0+阅读 · 2023年7月28日

Learning Compliant Stiffness by Impedance Control-Aware Task Segmentation and Multi-objective Bayesian Optimization with Priors

Arxiv

0+阅读 · 2023年7月28日

Thinker: Learning to Plan and Act

Arxiv

1+阅读 · 2023年7月27日

Dynamic neighbourhood optimisation for task allocation using multi-agent

Arxiv

101+阅读 · 2022年5月11日

相关基金

基于多模态影像学研究Aβ沉积对AD神经功能网络连接的影响机制

国家自然科学基金

0+阅读 · 2014年12月31日

GI介导干旱胁迫响应和干旱逃逸的分子机理

国家自然科学基金

0+阅读 · 2014年12月31日

有机-无机三缺位杂多钼氧簇稀土金属衍生物光电材料制备及性能研究

国家自然科学基金

0+阅读 · 2014年12月31日

典型木本植物叶片与大气汞的交换通量及其影响因素研究

国家自然科学基金

0+阅读 · 2013年12月31日

(Ti,Ta)C涂层原位转化制备中空多孔Ta2O5/TiO2异质结及其光催化性能

国家自然科学基金

0+阅读 · 2013年12月31日

实时安全关键系统的建模、仿真与验证

国家自然科学基金

1+阅读 · 2012年12月31日

基于ForCES的软件定义网络（SDN）研究

国家自然科学基金

1+阅读 · 2012年12月31日

宽光谱多金属氧酸盐分子光电器件的设计与组装

国家自然科学基金

0+阅读 · 2011年12月31日

宽带放大器用Er3+/Ce3+共掺碲酸盐玻璃及光纤1.53μm波段辐射强度提高研究

国家自然科学基金

0+阅读 · 2011年12月31日

OECs-NT-3基因工程细胞联合NSCs自体移植对自身免疫性脑脊髓炎突触信号传导及髓鞘修复的影响

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员