带有现实数据集的离线 RL, 具有现实数据集: 心电量和支助限制 (Offline RL With Realistic Datasets: Heteroskedasticity and Support Constraints) - 专知论文

会员服务 ·

0

Learning · 约束 · 状态空间 · Principle · MINE ·

2022 年 11 月 21 日

Offline RL With Realistic Datasets: Heteroskedasticity and Support Constraints

翻译：带有现实数据集的离线 RL, 具有现实数据集: 心电量和支助限制

Anikait Singh,Aviral Kumar,Quan Vuong,Yevgen Chebotar,Sergey Levine

Offline reinforcement learning (RL) learns policies entirely from static datasets, thereby avoiding the challenges associated with online data collection. Practical applications of offline RL will inevitably require learning from datasets where the variability of demonstrated behaviors changes non-uniformly across the state space. For example, at a red light, nearly all human drivers behave similarly by stopping, but when merging onto a highway, some drivers merge quickly, efficiently, and safely, while many hesitate or merge dangerously. Both theoretically and empirically, we show that typical offline RL methods, which are based on distribution constraints fail to learn from data with such non-uniform variability, due to the requirement to stay close to the behavior policy to the same extent across the state space. Ideally, the learned policy should be free to choose per state how closely to follow the behavior policy to maximize long-term return, as long as the learned policy stays within the support of the behavior policy. To instantiate this principle, we reweight the data distribution in conservative Q-learning (CQL) to obtain an approximate support constraint formulation. The reweighted distribution is a mixture of the current policy and an additional policy trained to mine poor actions that are likely under the behavior policy. Our method, CQL (ReDS), is simple, theoretically motivated, and improves performance across a wide range of offline RL problems in Atari games, navigation, and pixel-based manipulation.

翻译：离线强化学习( RL) 完全从静态数据集中学习政策, 从而避免与在线数据收集相关的挑战。离线 RL 的实际应用将不可避免地要求从数据集中学习,因为显示的行为的变异性会在整个州空间发生不统一的变化。例如,在红灯下,几乎所有的人类驾驶员都通过停止行动来类似,但在合并到高速公路时,有些驾驶员会快速、高效和安全地合并或合并,而许多驾驶员则会犹豫或危险地合并。从理论上和从经验上看,我们显示典型的离线 RL 方法,这些方法基于分布限制,无法从非统一变异性的数据中学习,因为要求在整个州空间与行为政策保持同样程度的距离。理想的情况是,学习后的政策应该可以自由选择如何密切地遵守行为政策,以最大程度的长期回报,只要所学的政策不超出行为政策的支持范围。为了快速地调整这一原则,我们重新权衡基于保守的 Q 学习( CQL) 的数据分配, 以近似的支持制约度的配置。在目前的政策中, 重新加权的分布是一种组合中, 并且经过训练后, 我们的政策范围改进后, 在政策中, 改进后的政策是一种简单的操作中的一种额外的操作方法, 。

0

相关内容

Learning

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium7

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium7

中国图象图形学学会CSIG

0+阅读 · 2021年11月15日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

GJ的Ca2+传递引起钙稳态失衡诱导内质网应激在肝移植术后急性肾损伤中的作用研究

国家自然科学基金

0+阅读 · 2015年12月31日

晶面可控的Sillenite结构可见光响应光催化材料的制备及其光催化性能的晶面依赖性研究

国家自然科学基金

0+阅读 · 2014年12月31日

以酚肟类配体为基础的Mn/Ln异金属簇合物的合成及其磁性研究

国家自然科学基金

0+阅读 · 2013年12月31日

TLR4活化TAP63a诱导细胞凋亡的分子机制

国家自然科学基金

0+阅读 · 2012年12月31日

云计算环境下数据中心的power capping关键问题研究

国家自然科学基金

0+阅读 · 2012年12月31日

副血链球菌调控因子FimR调控网络构建及其在氧化应激中的作用

国家自然科学基金

0+阅读 · 2012年12月31日

疏肝益肾方抗乳腺癌内分泌治疗耐药的增效机制

国家自然科学基金

0+阅读 · 2011年12月31日

AUF1对p16 mRNA turnover 的调控机制及其在细胞衰老过程中的意义

国家自然科学基金

0+阅读 · 2009年12月31日

广义Fermat猜想与相关的丢番图方程

国家自然科学基金

1+阅读 · 2009年12月31日

细胞内崩解型高分子胶束及其药物释放行为

国家自然科学基金

0+阅读 · 2009年12月31日

Offline Policy Evaluation with Out-of-Sample Guarantees

Arxiv

0+阅读 · 2023年1月20日

Mixed-Integer Optimization with Constraint Learning

Arxiv

0+阅读 · 2023年1月20日

Score-based Causal Representation Learning with Interventions

Arxiv

0+阅读 · 2023年1月19日

Face Recognition in the age of CLIP & Billion image datasets

Arxiv

0+阅读 · 2023年1月18日

Rethinking the Data Annotation Process for Multi-view 3D Pose Estimation with Active Learning and Self-Training

Arxiv

0+阅读 · 2023年1月17日

Bayesian Deep Learning for Graphs

Arxiv

23+阅读 · 2022年2月24日

Heterogeneous Network Representation Learning: A Unified Framework with Survey and Benchmark

Heterogeneous Network Representation Learning: A Unified Framework with Survey and Benchmark

Arxiv

19+阅读 · 2020年12月17日

Self-correcting Q-Learning

Arxiv

11+阅读 · 2020年12月2日

Class-Balanced Loss Based on Effective Number of Samples

Arxiv

12+阅读 · 2019年1月16日

Deep Reinforcement Learning for List-wise Recommendations

Arxiv

13+阅读 · 2018年1月5日

VIP会员

文章信息

相关主题

相关VIP内容

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【CMU博士论文】数据驱动决策中的激励、信息与不确定性

DGP双粒度提示框架：图增强大模型助力欺诈检测

【ICCV2025】ESSENTIAL：用于视频类增量学习的情景记忆与语义记忆整合

唯快不破：大型语言模型高效架构综述

相关资讯

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium7

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium7

中国图象图形学学会CSIG

0+阅读 · 2021年11月15日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

Offline Policy Evaluation with Out-of-Sample Guarantees

Arxiv

0+阅读 · 2023年1月20日

Mixed-Integer Optimization with Constraint Learning

Arxiv

0+阅读 · 2023年1月20日

Score-based Causal Representation Learning with Interventions

Arxiv

0+阅读 · 2023年1月19日

Face Recognition in the age of CLIP & Billion image datasets

Arxiv

0+阅读 · 2023年1月18日

Rethinking the Data Annotation Process for Multi-view 3D Pose Estimation with Active Learning and Self-Training

Arxiv

0+阅读 · 2023年1月17日

Bayesian Deep Learning for Graphs

Arxiv

23+阅读 · 2022年2月24日

Heterogeneous Network Representation Learning: A Unified Framework with Survey and Benchmark

Heterogeneous Network Representation Learning: A Unified Framework with Survey and Benchmark

Arxiv

19+阅读 · 2020年12月17日

Self-correcting Q-Learning

Arxiv

11+阅读 · 2020年12月2日

Class-Balanced Loss Based on Effective Number of Samples

Arxiv

12+阅读 · 2019年1月16日

Deep Reinforcement Learning for List-wise Recommendations

Arxiv

13+阅读 · 2018年1月5日

相关基金

GJ的Ca2+传递引起钙稳态失衡诱导内质网应激在肝移植术后急性肾损伤中的作用研究

国家自然科学基金

0+阅读 · 2015年12月31日

晶面可控的Sillenite结构可见光响应光催化材料的制备及其光催化性能的晶面依赖性研究

国家自然科学基金

0+阅读 · 2014年12月31日

以酚肟类配体为基础的Mn/Ln异金属簇合物的合成及其磁性研究

国家自然科学基金

0+阅读 · 2013年12月31日

TLR4活化TAP63a诱导细胞凋亡的分子机制

国家自然科学基金

0+阅读 · 2012年12月31日

云计算环境下数据中心的power capping关键问题研究

国家自然科学基金

0+阅读 · 2012年12月31日

副血链球菌调控因子FimR调控网络构建及其在氧化应激中的作用

国家自然科学基金

0+阅读 · 2012年12月31日

疏肝益肾方抗乳腺癌内分泌治疗耐药的增效机制

国家自然科学基金

0+阅读 · 2011年12月31日

AUF1对p16 mRNA turnover 的调控机制及其在细胞衰老过程中的意义

国家自然科学基金

0+阅读 · 2009年12月31日

广义Fermat猜想与相关的丢番图方程

国家自然科学基金

1+阅读 · 2009年12月31日

细胞内崩解型高分子胶束及其药物释放行为

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员