离线强化学习的保守学习 (Contextual Conservative Q-Learning for Offline Reinforcement Learning) - 专知论文

会员服务 ·

0

Learning · 强化学习 · 过估计 · MoDELS · INFORMS ·

2023 年 1 月 3 日

Contextual Conservative Q-Learning for Offline Reinforcement Learning

翻译：离线强化学习的保守学习

Ke Jiang,Jiayu Yao,Xiaoyang Tan

Offline reinforcement learning learns an effective policy on offline datasets without online interaction, and it attracts persistent research attention due to its potential of practical application. However, extrapolation error generated by distribution shift will still lead to the overestimation for those actions that transit to out-of-distribution(OOD) states, which degrades the reliability and robustness of the offline policy. In this paper, we propose Contextual Conservative Q-Learning(C-CQL) to learn a robustly reliable policy through the contextual information captured via an inverse dynamics model. With the supervision of the inverse dynamics model, it tends to learn a policy that generates stable transition at perturbed states, for the fact that pertuebed states are a common kind of OOD states. In this manner, we enable the learnt policy more likely to generate transition that destines to the empirical next state distributions of the offline dataset, i.e., robustly reliable transition. Besides, we theoretically reveal that C-CQL is the generalization of the Conservative Q-Learning(CQL) and aggressive State Deviation Correction(SDC). Finally, experimental results demonstrate the proposed C-CQL achieves the state-of-the-art performance in most environments of offline Mujoco suite and a noisy Mujoco setting.

翻译：离线强化学习在不在线互动的情况下学习关于离线数据集的有效政策,并因其实际应用潜力而吸引持续的研究关注。然而,分配转移产生的外推错误仍将导致高估那些转至离线(OOD)状态的行动,从而降低离线政策的可靠性和稳健性。在本文中,我们提议通过通过反向动态模型获取的背景信息来学习强有力的可靠政策。在反向动态模型的监督下,它倾向于学习一种在偏向状态产生稳定过渡的政策,因为处于边缘状态的国家是OOOD状态的常见类型。通过这种方式,我们使得所学的政策更有可能产生向离线数据集(即,稳健可靠的过渡)下一个经验性状态分布的转变。此外,我们理论上表明C-CQL(C-QIL)是保守性Q-LINTED(C QQL) 的概括化, 以及侵略性国的实验性-C-C-C-C-C-C-C-C-C-L-SAL-S-SAL-SAL-SAL-SAL-SAL-SAL-ATINSL-SAL-SAL-SAL-SAL-SAL-SAL-SAL-ATINS-ATINS-AD-AD-SL-SL-AD-ATINSL-S-S-S-S-S-N-N-S-S-S-S-S-S-AD-L-S-S-S-N-N-MATINSL-S-S-L-L-L-I-L-L-L-L-L-L-S-I-N-MATINSTITITINS-ATITITINS-S-S-N-N-TINS-N-N-N-N-N-AD-N-N-N-N-S-I-S-N-N-N-MATIal-MAD-N-S-S-S-S-N-N-N-N-AD-AD-N-AD-S-S-S-S-S-S-AD-N-N-MA-MA-MA-MA-MA-MA-MA-MA-MA-MA-MA-MA

0

相关内容

Learning

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

专知会员服务

23+阅读 · 2022年3月19日

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

104+阅读 · 2022年2月10日

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

【DeepMind】基于模型的强化学习，174页ppt，Model-Based Reinforcement Learning

【DeepMind】基于模型的强化学习，174页ppt，Model-Based Reinforcement Learning

专知会员服务

89+阅读 · 2021年1月12日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【强化学习资源集合】Awesome Reinforcement Learning

【强化学习资源集合】Awesome Reinforcement Learning

专知会员服务

97+阅读 · 2019年12月23日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

PinX1基因通过c-Myc抑制胶质瘤增殖的分子机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

Triptolide诱导c-FLIP选择性剪切在调控TRAIL耐药胰腺癌细胞凋亡中的机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

TotalNOx/NH3锆基传感器阵列的研发及相关传感机理探究

国家自然科学基金

0+阅读 · 2014年12月31日

YB-1介导血管内皮细胞凋亡的分子机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

HGF-Met轴在胰岛素抵抗中的作用及分子机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

TRPM7参与EGF诱导的肺腺癌细胞迁移的分子机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

关于图上随机游走、渗流的几个问题

国家自然科学基金

0+阅读 · 2012年12月31日

去酰基化ghrelin改善脂肪组织炎症所致胰岛素抵抗的机制- - 调节性T细胞的作用

国家自然科学基金

0+阅读 · 2011年12月31日

miR-124和miR-27对阿尔茨海默病BACE1基因影响的分子机制

国家自然科学基金

0+阅读 · 2011年12月31日

p53对大肠癌中Numb/Notch信号通路调控的分子机制研究

国家自然科学基金

0+阅读 · 2008年12月31日

Resource-Constrained Station-Keeping for Helium Balloons using Reinforcement Learning

Arxiv

0+阅读 · 2023年3月2日

FedFormer: Contextual Federation with Attention in Reinforcement Learning

Arxiv

0+阅读 · 2023年3月2日

LS-IQ: Implicit Reward Regularization for Inverse Reinforcement Learning

Arxiv

0+阅读 · 2023年3月1日

Learning from Good Trajectories in Offline Multi-Agent Reinforcement Learning

Arxiv

0+阅读 · 2023年3月1日

A Deep Reinforcement Learning Trader without Offline Training

Arxiv

0+阅读 · 2023年3月1日

Learning to Control Autonomous Fleets from Observation via Offline Reinforcement Learning

Arxiv

0+阅读 · 2023年2月28日

A deep inverse reinforcement learning approach to route choice modeling with context-dependent rewards

Arxiv

0+阅读 · 2023年2月28日

The In-Sample Softmax for Offline Reinforcement Learning

Arxiv

0+阅读 · 2023年2月28日

Behavior Prior Representation learning for Offline Reinforcement Learning

Arxiv

0+阅读 · 2023年2月28日

Data-Free Knowledge Distillation for Heterogeneous Federated Learning

Arxiv

12+阅读 · 2021年6月9日

VIP会员

文章信息

相关主题

相关VIP内容

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

专知会员服务

23+阅读 · 2022年3月19日

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

104+阅读 · 2022年2月10日

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

【DeepMind】基于模型的强化学习，174页ppt，Model-Based Reinforcement Learning

【DeepMind】基于模型的强化学习，174页ppt，Model-Based Reinforcement Learning

专知会员服务

89+阅读 · 2021年1月12日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【强化学习资源集合】Awesome Reinforcement Learning

【强化学习资源集合】Awesome Reinforcement Learning

专知会员服务

97+阅读 · 2019年12月23日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《美国海军陆战队软件定义网络应用案例：分布式防火墙自动化系统》148页

《多体环境下定位导航授时（PNT）系统研究》228页

软件定义无线电（SDR）：商业与军事领域的技术、应用及未来趋势

《攻势防空作战中无人追击者/规避者最优轨迹研究（含动态交战区建模）》95页

相关资讯

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Resource-Constrained Station-Keeping for Helium Balloons using Reinforcement Learning

Arxiv

0+阅读 · 2023年3月2日

FedFormer: Contextual Federation with Attention in Reinforcement Learning

Arxiv

0+阅读 · 2023年3月2日

LS-IQ: Implicit Reward Regularization for Inverse Reinforcement Learning

Arxiv

0+阅读 · 2023年3月1日

Learning from Good Trajectories in Offline Multi-Agent Reinforcement Learning

Arxiv

0+阅读 · 2023年3月1日

A Deep Reinforcement Learning Trader without Offline Training

Arxiv

0+阅读 · 2023年3月1日

Learning to Control Autonomous Fleets from Observation via Offline Reinforcement Learning

Arxiv

0+阅读 · 2023年2月28日

A deep inverse reinforcement learning approach to route choice modeling with context-dependent rewards

Arxiv

0+阅读 · 2023年2月28日

The In-Sample Softmax for Offline Reinforcement Learning

Arxiv

0+阅读 · 2023年2月28日

Behavior Prior Representation learning for Offline Reinforcement Learning

Arxiv

0+阅读 · 2023年2月28日

Data-Free Knowledge Distillation for Heterogeneous Federated Learning

Arxiv

12+阅读 · 2021年6月9日

相关基金

PinX1基因通过c-Myc抑制胶质瘤增殖的分子机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

Triptolide诱导c-FLIP选择性剪切在调控TRAIL耐药胰腺癌细胞凋亡中的机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

TotalNOx/NH3锆基传感器阵列的研发及相关传感机理探究

国家自然科学基金

0+阅读 · 2014年12月31日

YB-1介导血管内皮细胞凋亡的分子机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

HGF-Met轴在胰岛素抵抗中的作用及分子机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

TRPM7参与EGF诱导的肺腺癌细胞迁移的分子机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

关于图上随机游走、渗流的几个问题

国家自然科学基金

0+阅读 · 2012年12月31日

去酰基化ghrelin改善脂肪组织炎症所致胰岛素抵抗的机制- - 调节性T细胞的作用

国家自然科学基金

0+阅读 · 2011年12月31日

miR-124和miR-27对阿尔茨海默病BACE1基因影响的分子机制

国家自然科学基金

0+阅读 · 2011年12月31日

p53对大肠癌中Numb/Notch信号通路调控的分子机制研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员