在信任之前双次检查您的状态: 信任- 软件双向离线模型想象 (Double Check Your State Before Trusting It: Confidence-Aware Bidirectional Offline Model-Based Imagination) - 专知论文

会员服务 ·

0

Learning · Performer · MoDELS · 前向 · 数据集 ·

2022 年 6 月 16 日

Double Check Your State Before Trusting It: Confidence-Aware Bidirectional Offline Model-Based Imagination

翻译：在信任之前双次检查您的状态: 信任- 软件双向离线模型想象

Jiafei Lyu,Xiu Li,Zongqing Lu

The learned policy of model-free offline reinforcement learning (RL) methods is often constrained to stay within the support of datasets to avoid possible dangerous out-of-distribution actions or states, making it challenging to handle out-of-support region. Model-based RL methods offer a richer dataset and benefit generalization by generating imaginary trajectories with either trained forward or reverse dynamics model. However, the imagined transitions may be inaccurate, thus downgrading the performance of the underlying offline RL method. In this paper, we propose to augment the offline dataset by using trained bidirectional dynamics models and rollout policies with double check. We introduce conservatism by trusting samples that the forward model and backward model agree on. Our method, confidence-aware bidirectional offline model-based imagination, generates reliable samples and can be combined with any model-free offline RL method. Experimental results on the D4RL benchmarks demonstrate that our method significantly boosts the performance of existing model-free offline RL algorithms and achieves competitive or better scores against baseline methods.

翻译：所学的无模式离线强化学习方法(RL)政策往往局限于支持数据集,以避免可能的危险分配外行动或国家,从而难以应对支持区域。基于模型的RL方法通过产生具有受过训练的前向或反向动态模型的想象轨迹,提供了更丰富的数据集和惠益的概括化。然而,想象的转变可能不准确,从而降低了离线下线强化方法的性能。在本文中,我们提议通过使用经过培训的双向动态模型和双向检查的推出政策来增加离线数据集。我们通过信任样本来引入由前向模式和后向模式所同意的保守主义。我们的方法、信心觉察双向离线模型的想象力、生成可靠的样本,并与任何无模式的离线RL方法相结合。 D4RL基准的实验结果表明,我们的方法大大提升了现有无模式离线下逻辑算法的性能,并实现了与基线方法相比具有竞争力或更好的分数。

0

相关内容

Learning

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

【USC-Aaron Chan博士答辩Slides】可信自然语言处理机器解释的生成与利用, 242页ppt，Generating and Utilizing Machine Explanations for Trustworthy NLP

【USC-Aaron Chan博士答辩Slides】可信自然语言处理机器解释的生成与利用, 242页ppt，Generating and Utilizing Machine Explanations for Trustworthy NLP

专知会员服务

16+阅读 · 2022年3月13日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

44+阅读 · 2015年12月31日

III型AtOFPs转录因子对拟南芥荚果形态的调控

国家自然科学基金

0+阅读 · 2014年12月31日

PRMT1介导的EMT与细胞衰老在乳腺癌转移中的作用及机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

β2-AR /PCBP2相互作用在胰腺癌发生发展中的作用及机制

国家自然科学基金

0+阅读 · 2014年12月31日

华北平原典型长大地裂缝形成演化机理研究

国家自然科学基金

0+阅读 · 2013年12月31日

PTBP1介导的survivinΔEx3过表达调控胶质母细胞瘤微血管增生的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

AR/let-7及其下游分子对ER-AR+乳腺癌干细胞生长的调控机制

国家自然科学基金

0+阅读 · 2011年12月31日

Reality-based Interaction用户界面模型和评估方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

粉体热压原位生成塑性耐高温梯度中间层连接镍与C/SiC的机理

国家自然科学基金

0+阅读 · 2009年12月31日

ClC-3氯通道调控多发性骨髓瘤细胞周期及机制的研究

国家自然科学基金

0+阅读 · 2009年12月31日

Using Instruments for Selection to Adjust for Selection Bias in Mendelian Randomization

Arxiv

0+阅读 · 2022年8月4日

Backward Imitation and Forward Reinforcement Learning via Bi-directional Model Rollouts

Arxiv

0+阅读 · 2022年8月4日

Serving and Optimizing Machine Learning Workflows on Heterogeneous Infrastructures

Arxiv

0+阅读 · 2022年8月3日

Randomization-based joint central limit theorem and efficient covariate adjustment in stratified $2^K$ factorial experiments

Arxiv

0+阅读 · 2022年8月3日

What can we Learn by Predicting Accuracy?

Arxiv

0+阅读 · 2022年8月2日

Model-Twin Randomization (MoTR): A Monte Carlo Method for Estimating the Within-Individual Average Treatment Effect Using Wearable Sensors

Model-Twin Randomization (MoTR): A Monte Carlo Method for Estimating the Within-Individual Average Treatment Effect Using Wearable Sensors

Arxiv

0+阅读 · 2022年8月2日

GINK: Graph-based Interaction-aware Kinodynamic Planning via Reinforcement Learning for Autonomous Driving

Arxiv

0+阅读 · 2022年8月2日

A Unifying Framework for Combining Complementary Strengths of Humans and ML toward Better Predictive Decision-Making

Arxiv

0+阅读 · 2022年8月2日

Faster Meta Update Strategy for Noise-Robust Deep Learning

Arxiv

11+阅读 · 2021年4月30日

Time-Series Event Prediction with Evolutionary State Graph

Arxiv

14+阅读 · 2020年11月25日

VIP会员

文章信息

相关主题

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

【USC-Aaron Chan博士答辩Slides】可信自然语言处理机器解释的生成与利用, 242页ppt，Generating and Utilizing Machine Explanations for Trustworthy NLP

【USC-Aaron Chan博士答辩Slides】可信自然语言处理机器解释的生成与利用, 242页ppt，Generating and Utilizing Machine Explanations for Trustworthy NLP

专知会员服务

16+阅读 · 2022年3月13日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【博士论文】低维与高维空间中潜在表征的分析、建模与变换

《生态建模密码破译：建模与编程实践》美陆军最新报告

大模型解决方案白皮书：社交陪伴场景全流程落地指南

面向具身操作的视觉-语言-动作模型综述

相关资讯

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

Using Instruments for Selection to Adjust for Selection Bias in Mendelian Randomization

Arxiv

0+阅读 · 2022年8月4日

Backward Imitation and Forward Reinforcement Learning via Bi-directional Model Rollouts

Arxiv

0+阅读 · 2022年8月4日

Serving and Optimizing Machine Learning Workflows on Heterogeneous Infrastructures

Arxiv

0+阅读 · 2022年8月3日

Randomization-based joint central limit theorem and efficient covariate adjustment in stratified $2^K$ factorial experiments

Arxiv

0+阅读 · 2022年8月3日

What can we Learn by Predicting Accuracy?

Arxiv

0+阅读 · 2022年8月2日

Model-Twin Randomization (MoTR): A Monte Carlo Method for Estimating the Within-Individual Average Treatment Effect Using Wearable Sensors

Model-Twin Randomization (MoTR): A Monte Carlo Method for Estimating the Within-Individual Average Treatment Effect Using Wearable Sensors

Arxiv

0+阅读 · 2022年8月2日

GINK: Graph-based Interaction-aware Kinodynamic Planning via Reinforcement Learning for Autonomous Driving

Arxiv

0+阅读 · 2022年8月2日

A Unifying Framework for Combining Complementary Strengths of Humans and ML toward Better Predictive Decision-Making

Arxiv

0+阅读 · 2022年8月2日

Faster Meta Update Strategy for Noise-Robust Deep Learning

Arxiv

11+阅读 · 2021年4月30日

Time-Series Event Prediction with Evolutionary State Graph

Arxiv

14+阅读 · 2020年11月25日

相关基金

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

44+阅读 · 2015年12月31日

III型AtOFPs转录因子对拟南芥荚果形态的调控

国家自然科学基金

0+阅读 · 2014年12月31日

PRMT1介导的EMT与细胞衰老在乳腺癌转移中的作用及机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

β2-AR /PCBP2相互作用在胰腺癌发生发展中的作用及机制

国家自然科学基金

0+阅读 · 2014年12月31日

华北平原典型长大地裂缝形成演化机理研究

国家自然科学基金

0+阅读 · 2013年12月31日

PTBP1介导的survivinΔEx3过表达调控胶质母细胞瘤微血管增生的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

AR/let-7及其下游分子对ER-AR+乳腺癌干细胞生长的调控机制

国家自然科学基金

0+阅读 · 2011年12月31日

Reality-based Interaction用户界面模型和评估方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

粉体热压原位生成塑性耐高温梯度中间层连接镍与C/SiC的机理

国家自然科学基金

0+阅读 · 2009年12月31日

ClC-3氯通道调控多发性骨髓瘤细胞周期及机制的研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员