用于离线强化学习的州-州-家用软件 (State-Aware Proximal Pessimistic Algorithms for Offline Reinforcement Learning) - 专知论文

会员服务 ·

0

Learning · 平稳的 · 正则化 · 强化学习 · Extensibility ·

2022 年 11 月 28 日

State-Aware Proximal Pessimistic Algorithms for Offline Reinforcement Learning

翻译：用于离线强化学习的州-州-家用软件

Chen Chen,Hongyao Tang,Yi Ma,Chao Wang,Qianli Shen,Dong Li,Jianye Hao

Pessimism is of great importance in offline reinforcement learning (RL). One broad category of offline RL algorithms fulfills pessimism by explicit or implicit behavior regularization. However, most of them only consider policy divergence as behavior regularization, ignoring the effect of how the offline state distribution differs with that of the learning policy, which may lead to under-pessimism for some states and over-pessimism for others. Taking account of this problem, we propose a principled algorithmic framework for offline RL, called \emph{State-Aware Proximal Pessimism} (SA-PP). The key idea of SA-PP is leveraging discounted stationary state distribution ratios between the learning policy and the offline dataset to modulate the degree of behavior regularization in a state-wise manner, so that pessimism can be implemented in a more appropriate way. We first provide theoretical justifications on the superiority of SA-PP over previous algorithms, demonstrating that SA-PP produces a lower suboptimality upper bound in a broad range of settings. Furthermore, we propose a new algorithm named \emph{State-Aware Conservative Q-Learning} (SA-CQL), by building SA-PP upon representative CQL algorithm with the help of DualDICE for estimating discounted stationary state distribution ratios. Extensive experiments on standard offline RL benchmark show that SA-CQL outperforms the popular baselines on a large portion of benchmarks and attains the highest average return.

翻译：在离线强化学习(RL)中,悲观是十分重要的。一个广泛的离线RL算法类别通过明示或隐含的行为规范化实现了悲观主义。然而,多数人认为政策差异只是行为规范化,忽视了离线州分布与学习政策差异的影响,这可能导致一些国家的悲观度低,而另一些国家则过于悲观。考虑到这一问题,我们提议了一个离线RL(称为\emph{国家-Aware Proximal Pessimism} (SA-PP))的原则性逻辑框架。SA-P的主要想法是利用学习政策与离线数据集之间的折扣性固定状态分配比率作为学习政策与离线数据集之间的折现性行为规范化,从而忽略了离线状态州分配的影响,从而可能以更适当的方式实施悲观主义。我们首先从理论上解释SA-PP优于以往算法的优越性,表明SA-PPPD在广泛的环境中产生较低的亚性亚缩性最高约束。此外,我们提议用名为REDQ(REalalal-alalalalalalalalalalal) 比例来显示SA-alimalimal-alimal-al-al Qalimalimalimalimaltial Qal Qal SAAL SAAL SAAL SAAL SAAL ASal_Sal_SA) 标准比标值的新的递离SAL 标准性基准级比率比率比率,以SAL 。

0

相关内容

Learning

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

AlGaN极化场调控对内量子效率的影响

国家自然科学基金

1+阅读 · 2016年12月31日

去泛素化酶USP4调节SMAD4蛋白单泛素化并调控TGF-β/Activin信号的研究

国家自然科学基金

0+阅读 · 2014年12月31日

TRIM33在表观遗传水平上对TGF-β信号通路的调控

国家自然科学基金

0+阅读 · 2014年12月31日

Notch信号通路在MSCs对COPD上皮细胞修复中的调控作用

国家自然科学基金

0+阅读 · 2014年12月31日

PPAR β/δ基因在结直肠癌血管生成调控中的作用及分子机理

国家自然科学基金

2+阅读 · 2014年12月31日

组蛋白去乙酰化酶抑制剂对骨关节炎中Notch-NFAT信号通路调控的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

函数域中的Vinogradov中值定理

国家自然科学基金

0+阅读 · 2012年12月31日

抑制TGF-β1/Smad 信号通路促进骨-肌腱结合部瘢痕愈合后的软骨性重塑

国家自然科学基金

0+阅读 · 2011年12月31日

EGCG通过TGF－βSTAT3信号途径抑制恶性黑素瘤上皮-间质转化？

国家自然科学基金

0+阅读 · 2009年12月31日

TGFβ#35843;节性T细胞通路在雌激素调节血管炎症中的作用研究

国家自然科学基金

0+阅读 · 2009年12月31日

Singularity-aware Reinforcement Learning

Arxiv

0+阅读 · 2023年1月30日

Guiding Online Reinforcement Learning with Action-Free Offline Pretraining

Arxiv

0+阅读 · 2023年1月30日

Sample Efficient Deep Reinforcement Learning via Local Planning

Arxiv

0+阅读 · 2023年1月29日

On Instance-Dependent Bounds for Offline Reinforcement Learning with Linear Function Approximation

Arxiv

0+阅读 · 2023年1月27日

Provably Efficient Causal Model-Based Reinforcement Learning for Systematic Generalization

Arxiv

0+阅读 · 2023年1月27日

Challenging Common Assumptions in Convex Reinforcement Learning

Arxiv

0+阅读 · 2023年1月27日

Demystifying Reinforcement Learning in Time-Varying Systems

Arxiv

0+阅读 · 2023年1月26日

Normality-Guided Distributional Reinforcement Learning for Continuous Control

Arxiv

0+阅读 · 2023年1月26日

Joint action loss for proximal policy optimization

Arxiv

0+阅读 · 2023年1月26日

Recent Advances in Reinforcement Learning in Finance

Arxiv

11+阅读 · 2021年12月8日

VIP会员

文章信息

相关主题

相关VIP内容

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【CMU博士论文】以人为中心的强化学习

任务规划与地形分析：现代复杂环境作战导航体系

认知优势：人工智能在国家安全决策中的核心作用

大模型赋能的具身智能：决策与具身学习综述

相关资讯

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Singularity-aware Reinforcement Learning

Arxiv

0+阅读 · 2023年1月30日

Guiding Online Reinforcement Learning with Action-Free Offline Pretraining

Arxiv

0+阅读 · 2023年1月30日

Sample Efficient Deep Reinforcement Learning via Local Planning

Arxiv

0+阅读 · 2023年1月29日

On Instance-Dependent Bounds for Offline Reinforcement Learning with Linear Function Approximation

Arxiv

0+阅读 · 2023年1月27日

Provably Efficient Causal Model-Based Reinforcement Learning for Systematic Generalization

Arxiv

0+阅读 · 2023年1月27日

Challenging Common Assumptions in Convex Reinforcement Learning

Arxiv

0+阅读 · 2023年1月27日

Demystifying Reinforcement Learning in Time-Varying Systems

Arxiv

0+阅读 · 2023年1月26日

Normality-Guided Distributional Reinforcement Learning for Continuous Control

Arxiv

0+阅读 · 2023年1月26日

Joint action loss for proximal policy optimization

Arxiv

0+阅读 · 2023年1月26日

Recent Advances in Reinforcement Learning in Finance

Arxiv

11+阅读 · 2021年12月8日

相关基金

AlGaN极化场调控对内量子效率的影响

国家自然科学基金

1+阅读 · 2016年12月31日

去泛素化酶USP4调节SMAD4蛋白单泛素化并调控TGF-β/Activin信号的研究

国家自然科学基金

0+阅读 · 2014年12月31日

TRIM33在表观遗传水平上对TGF-β信号通路的调控

国家自然科学基金

0+阅读 · 2014年12月31日

Notch信号通路在MSCs对COPD上皮细胞修复中的调控作用

国家自然科学基金

0+阅读 · 2014年12月31日

PPAR β/δ基因在结直肠癌血管生成调控中的作用及分子机理

国家自然科学基金

2+阅读 · 2014年12月31日

组蛋白去乙酰化酶抑制剂对骨关节炎中Notch-NFAT信号通路调控的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

函数域中的Vinogradov中值定理

国家自然科学基金

0+阅读 · 2012年12月31日

抑制TGF-β1/Smad 信号通路促进骨-肌腱结合部瘢痕愈合后的软骨性重塑

国家自然科学基金

0+阅读 · 2011年12月31日

EGCG通过TGF－βSTAT3信号途径抑制恶性黑素瘤上皮-间质转化？

国家自然科学基金

0+阅读 · 2009年12月31日

TGFβ#35843;节性T细胞通路在雌激素调节血管炎症中的作用研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员