极端 Q- 学习 : 最大 RL 无信封 (Extreme Q-Learning: MaxEnt RL without Entropy) - 专知论文

会员服务 ·

0

估计/估计量 · 在线 · 极大 · Continuity · Performer ·

2023 年 1 月 5 日

Extreme Q-Learning: MaxEnt RL without Entropy

翻译：极端 Q- 学习 : 最大 RL 无信封

Divyansh Garg,Joey Hejna,Matthieu Geist,Stefano Ermon

Modern Deep Reinforcement Learning (RL) algorithms require estimates of the maximal Q-value, which are difficult to compute in continuous domains with an infinite number of possible actions. In this work, we introduce a new update rule for online and offline RL which directly models the maximal value using Extreme Value Theory (EVT), drawing inspiration from Economics. By doing so, we avoid computing Q-values using out-of-distribution actions which is often a substantial source of error. Our key insight is to introduce an objective that directly estimates the optimal soft-value functions (LogSumExp) in the maximum entropy RL setting without needing to sample from a policy. Using EVT, we derive our Extreme Q-Learning framework and consequently online and, for the first time, offline MaxEnt Q-learning algorithms, that do not explicitly require access to a policy or its entropy. Our method obtains consistently strong performance in the D4RL benchmark, outperforming prior works by 10+ points on some tasks while offering moderate improvements over SAC and TD3 on online DM Control tasks.

翻译：现代深度强化学习算法要求估算最大 Q 值, 很难在连续域中以无限的可能动作计算。在这项工作中, 我们引入了一个新的在线和离线 RL 更新规则, 直接用经济学的灵感来模拟极值理论( EVT) 的最大化值。通过这样做, 我们避免使用分配外行动来计算Q值, 这往往是一个很大的错误源。我们的关键洞察力是引入一个目标, 直接估计最大加密 RL 设置中的最佳软值函数( LogSumExplace), 而不需要政策样本。我们使用 EVT 来绘制我们的极端Q- 学习框架, 从而在首次使用离线的 Max Ent Q 学习算法, 并不明确要求访问政策或其酶。我们的方法在 D4RL 基准中取得一贯强的性能, 以10+ 点的速度完成某些任务之前的工作, 而在SAC 和 TD3 在线管理 DMDM 任务上提供适度的改进。

0

相关内容

估计/估计量

估计/估计量

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

104+阅读 · 2022年2月10日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

80+阅读 · 2020年7月26日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

【ICIG2021】Latest News & Announcements of the Industry Talk2

【ICIG2021】Latest News & Announcements of the Industry Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年7月29日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

具有高活性晶面In2O3纳米晶的控制性制备及其气敏增强机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

一类随机Navier-Stokes方程的数值解及其应用

国家自然科学基金

1+阅读 · 2015年12月31日

SREBP-1c/lncRNA BC158825在高脂致骨骼肌胰岛素抵抗中的作用研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于工况在线统计与电池能量随机存储优化的并联式混合动力系统能量管理策略研究

国家自然科学基金

1+阅读 · 2012年12月31日

基于特征子空间分解的高精度实时电网频率测量方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

LncRNA-MEG3对猪骨骼肌细胞增殖的作用及调控机制

国家自然科学基金

0+阅读 · 2012年12月31日

ICOS调节Treg增殖及功能机制及其在抗肿瘤免疫治疗中的作用

国家自然科学基金

0+阅读 · 2012年12月31日

Pt的高温高压状态方程精密测量

国家自然科学基金

0+阅读 · 2009年12月31日

Th17细胞及其与Treg的免疫平衡在艾滋病发病机理中作用的研究

国家自然科学基金

0+阅读 · 2008年12月31日

球面学习理论研究

国家自然科学基金

1+阅读 · 2008年12月31日

Handling Sparse Rewards in Reinforcement Learning Using Model Predictive Control

Arxiv

0+阅读 · 2023年3月3日

Entropy Augmented Reinforcement Learning

Arxiv

0+阅读 · 2023年3月3日

Information Measures for Entropy and Symmetry

Arxiv

0+阅读 · 2023年3月2日

Expert-Free Online Transfer Learning in Multi-Agent Reinforcement Learning

Arxiv

0+阅读 · 2023年3月2日

Learning to Detect Slip through Tactile Measures of the Contact Force Field and its Entropy

Arxiv

0+阅读 · 2023年3月2日

STUNT: Few-shot Tabular Learning with Self-generated Tasks from Unlabeled Tables

Arxiv

0+阅读 · 2023年3月2日

Efficient Scheduling of Data Augmentation for Deep Reinforcement Learning

Arxiv

0+阅读 · 2023年3月1日

Greedy Actor-Critic: A New Conditional Cross-Entropy Method for Policy Improvement

Arxiv

0+阅读 · 2023年2月28日

Solving Challenging Control Problems Using Two-Staged Deep Reinforcement Learning

Arxiv

0+阅读 · 2023年2月28日

Self-Supervised Learning via Maximum Entropy Coding

Arxiv

13+阅读 · 2022年10月20日

VIP会员

文章信息

相关主题

估计/估计量

相关VIP内容

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

104+阅读 · 2022年2月10日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

80+阅读 · 2020年7月26日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《无人机战争时代的战时法：大国竞争中的区分原则、相称性原则与行动建议》最新75页

《构建强健军事力量的设计挑战：提升海军兵力支持系统效能的多分辨率建模方法》69页

正视无人机心理战：恐惧效应与战略反思

《精确反蜂群防御系统：三维运动探测与定向空爆拦截技术融合》最新24页

相关资讯

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

【ICIG2021】Latest News & Announcements of the Industry Talk2

【ICIG2021】Latest News & Announcements of the Industry Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年7月29日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Handling Sparse Rewards in Reinforcement Learning Using Model Predictive Control

Arxiv

0+阅读 · 2023年3月3日

Entropy Augmented Reinforcement Learning

Arxiv

0+阅读 · 2023年3月3日

Information Measures for Entropy and Symmetry

Arxiv

0+阅读 · 2023年3月2日

Expert-Free Online Transfer Learning in Multi-Agent Reinforcement Learning

Arxiv

0+阅读 · 2023年3月2日

Learning to Detect Slip through Tactile Measures of the Contact Force Field and its Entropy

Arxiv

0+阅读 · 2023年3月2日

STUNT: Few-shot Tabular Learning with Self-generated Tasks from Unlabeled Tables

Arxiv

0+阅读 · 2023年3月2日

Efficient Scheduling of Data Augmentation for Deep Reinforcement Learning

Arxiv

0+阅读 · 2023年3月1日

Greedy Actor-Critic: A New Conditional Cross-Entropy Method for Policy Improvement

Arxiv

0+阅读 · 2023年2月28日

Solving Challenging Control Problems Using Two-Staged Deep Reinforcement Learning

Arxiv

0+阅读 · 2023年2月28日

Self-Supervised Learning via Maximum Entropy Coding

Arxiv

13+阅读 · 2022年10月20日

相关基金

具有高活性晶面In2O3纳米晶的控制性制备及其气敏增强机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

一类随机Navier-Stokes方程的数值解及其应用

国家自然科学基金

1+阅读 · 2015年12月31日

SREBP-1c/lncRNA BC158825在高脂致骨骼肌胰岛素抵抗中的作用研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于工况在线统计与电池能量随机存储优化的并联式混合动力系统能量管理策略研究

国家自然科学基金

1+阅读 · 2012年12月31日

基于特征子空间分解的高精度实时电网频率测量方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

LncRNA-MEG3对猪骨骼肌细胞增殖的作用及调控机制

国家自然科学基金

0+阅读 · 2012年12月31日

ICOS调节Treg增殖及功能机制及其在抗肿瘤免疫治疗中的作用

国家自然科学基金

0+阅读 · 2012年12月31日

Pt的高温高压状态方程精密测量

国家自然科学基金

0+阅读 · 2009年12月31日

Th17细胞及其与Treg的免疫平衡在艾滋病发病机理中作用的研究

国家自然科学基金

0+阅读 · 2008年12月31日

球面学习理论研究

国家自然科学基金

1+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员