RORL:通过保守性平滑进行强有力的离线强化学习 (RORL: Robust Offline Reinforcement Learning via Conservative Smoothing) - 专知论文

会员服务 ·

0

稳健性 · Learning · 平滑 · 估计/估计量 · 强化学习 ·

2022 年 10 月 22 日

RORL: Robust Offline Reinforcement Learning via Conservative Smoothing

翻译：RORL:通过保守性平滑进行强有力的离线强化学习

Rui Yang,Chenjia Bai,Xiaoteng Ma,Zhaoran Wang,Chongjie Zhang,Lei Han

from arxiv, Accepted by Advances in Neural Information Processing Systems (NeurIPS) 2022

Offline reinforcement learning (RL) provides a promising direction to exploit massive amount of offline data for complex decision-making tasks. Due to the distribution shift issue, current offline RL algorithms are generally designed to be conservative in value estimation and action selection. However, such conservatism can impair the robustness of learned policies when encountering observation deviation under realistic conditions, such as sensor errors and adversarial attacks. To trade off robustness and conservatism, we propose Robust Offline Reinforcement Learning (RORL) with a novel conservative smoothing technique. In RORL, we explicitly introduce regularization on the policy and the value function for states near the dataset, as well as additional conservative value estimation on these states. Theoretically, we show RORL enjoys a tighter suboptimality bound than recent theoretical results in linear MDPs. We demonstrate that RORL can achieve state-of-the-art performance on the general offline RL benchmark and is considerably robust to adversarial observation perturbations.

翻译：离线强化学习(RL)为利用大量离线数据进行复杂的决策任务提供了一个有希望的方向。由于分布转移问题,目前的离线RL算法一般在价值估计和行动选择方面设计保守。然而,这种保守主义在现实条件下遇到感应错误和对抗性攻击等观察偏差时,会损害学习政策的稳健性。为了交换稳健性和保守主义,我们建议用一种新颖的保守的平滑技术来交换离线强化学习(ROL)。在RORL中,我们明确引入了对数据集附近各州的政策和价值功能的规范化,以及对这些州的额外保守值估计。理论上,我们显示RORL拥有比线性MDP最近理论结果更紧密的亚优性。我们证明,RORL可以在一般离线基准上实现最先进的业绩,并相当有力地适用于对立对立性观察。

0

相关内容

稳健性

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

53+阅读 · 2020年9月7日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

脑胶质瘤中Hedgehog通路介导的长链非编码RNA-MEG3作用机制的研究

国家自然科学基金

0+阅读 · 2015年12月31日

罗巴代数的表示和罗巴代数在operad中的应用

国家自然科学基金

0+阅读 · 2015年12月31日

Serglycin调控TGF-β信号通路诱导EMT促进膀胱癌转移机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

MicroRNA调控BACE1在AD发病中的作用与机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

Cofilin在Erucin诱导的乳腺癌细胞线粒体分裂和细胞凋亡中的作用及分子机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

MicRNA107调控BACE1mRNA基因与阿尔茨海默病内质网应激病理机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

Survivin 在瘢痕疙瘩中的作用及机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

DLC-1信号通路系统介导TRAIL诱导人非小细胞肺癌细胞凋亡的研究

国家自然科学基金

0+阅读 · 2011年12月31日

survivin拮抗细胞衰老的机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

miR-155负向调控肺癌细胞凋亡及DNA损伤的机制及其在肺癌发生、发展中的作用研究

国家自然科学基金

0+阅读 · 2009年12月31日

Frugal Reinforcement-based Active Learning

Arxiv

0+阅读 · 2022年12月9日

Confidence-Conditioned Value Functions for Offline Reinforcement Learning

Arxiv

0+阅读 · 2022年12月8日

Learning Options via Compression

Arxiv

0+阅读 · 2022年12月8日

Welfare and Fairness in Multi-objective Reinforcement Learning

Arxiv

0+阅读 · 2022年12月8日

Reinforcement Learning for Resilient Power Grids

Arxiv

0+阅读 · 2022年12月8日

What is the Solution for State-Adversarial Multi-Agent Reinforcement Learning?

Arxiv

0+阅读 · 2022年12月7日

Consensus Learning for Cooperative Multi-Agent Reinforcement Learning

Arxiv

0+阅读 · 2022年12月7日

Fast Lifelong Adaptive Inverse Reinforcement Learning from Demonstrations

Arxiv

0+阅读 · 2022年12月6日

Learning Robust Policy against Disturbance in Transition Dynamics via State-Conservative Policy Optimization

Arxiv

13+阅读 · 2021年12月20日

A Multi-Objective Deep Reinforcement Learning Framework

A Multi-Objective Deep Reinforcement Learning Framework

Arxiv

16+阅读 · 2018年6月27日

VIP会员

文章信息

相关主题

估计/估计量

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

53+阅读 · 2020年9月7日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【CMU博士论文】数据驱动决策中的激励、信息与不确定性

DGP双粒度提示框架：图增强大模型助力欺诈检测

【ICCV2025】ESSENTIAL：用于视频类增量学习的情景记忆与语义记忆整合

唯快不破：大型语言模型高效架构综述

相关资讯

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Frugal Reinforcement-based Active Learning

Arxiv

0+阅读 · 2022年12月9日

Confidence-Conditioned Value Functions for Offline Reinforcement Learning

Arxiv

0+阅读 · 2022年12月8日

Learning Options via Compression

Arxiv

0+阅读 · 2022年12月8日

Welfare and Fairness in Multi-objective Reinforcement Learning

Arxiv

0+阅读 · 2022年12月8日

Reinforcement Learning for Resilient Power Grids

Arxiv

0+阅读 · 2022年12月8日

What is the Solution for State-Adversarial Multi-Agent Reinforcement Learning?

Arxiv

0+阅读 · 2022年12月7日

Consensus Learning for Cooperative Multi-Agent Reinforcement Learning

Arxiv

0+阅读 · 2022年12月7日

Fast Lifelong Adaptive Inverse Reinforcement Learning from Demonstrations

Arxiv

0+阅读 · 2022年12月6日

Learning Robust Policy against Disturbance in Transition Dynamics via State-Conservative Policy Optimization

Arxiv

13+阅读 · 2021年12月20日

A Multi-Objective Deep Reinforcement Learning Framework

A Multi-Objective Deep Reinforcement Learning Framework

Arxiv

16+阅读 · 2018年6月27日

相关基金

脑胶质瘤中Hedgehog通路介导的长链非编码RNA-MEG3作用机制的研究

国家自然科学基金

0+阅读 · 2015年12月31日

罗巴代数的表示和罗巴代数在operad中的应用

国家自然科学基金

0+阅读 · 2015年12月31日

Serglycin调控TGF-β信号通路诱导EMT促进膀胱癌转移机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

MicroRNA调控BACE1在AD发病中的作用与机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

Cofilin在Erucin诱导的乳腺癌细胞线粒体分裂和细胞凋亡中的作用及分子机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

MicRNA107调控BACE1mRNA基因与阿尔茨海默病内质网应激病理机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

Survivin 在瘢痕疙瘩中的作用及机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

DLC-1信号通路系统介导TRAIL诱导人非小细胞肺癌细胞凋亡的研究

国家自然科学基金

0+阅读 · 2011年12月31日

survivin拮抗细胞衰老的机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

miR-155负向调控肺癌细胞凋亡及DNA损伤的机制及其在肺癌发生、发展中的作用研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员