Collaborative World Models: An Online-Offline Transfer RL Approach - 专知论文

会员服务 ·

0

Performer · Learning · 价值函数 · 过估计 · MoDELS ·

2023 年 5 月 25 日

Collaborative World Models: An Online-Offline Transfer RL Approach

翻译：暂无翻译

Qi Wang,Junming Yang,Yunbo Wang,Xin Jin,Wenjun Zeng,Xiaokang Yang

Training visual reinforcement learning (RL) models in offline datasets is challenging due to overfitting issues in representation learning and overestimation problems in value function. In this paper, we propose a transfer learning method called Collaborative World Models (CoWorld) to improve the performance of visual RL under offline conditions. The core idea is to use an easy-to-interact, off-the-shelf simulator to train an auxiliary RL model as the online "test bed" for the offline policy learned in the target domain, which provides a flexible constraint for the value function -- Intuitively, we want to mitigate the overestimation problem of value functions outside the offline data distribution without impeding the exploration of actions with potential advantages. Specifically, CoWorld performs domain-collaborative representation learning to bridge the gap between online and offline hidden state distributions. Furthermore, it performs domain-collaborative behavior learning that enables the source RL agent to provide target-aware value estimation, allowing for effective offline policy regularization. Experiments show that CoWorld significantly outperforms existing methods in offline visual control tasks in DeepMind Control and Meta-World.

翻译：暂无翻译

0

相关内容

Performer

Meta最新WWW2022《联邦计算导论》教程，附77页ppt

Meta最新WWW2022《联邦计算导论》教程，附77页ppt

专知会员服务

60+阅读 · 2022年5月5日

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

126+阅读 · 2022年4月21日

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

54+阅读 · 2020年9月7日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

【论文】图上的表示学习综述

【论文】图上的表示学习综述

机器学习研究会

15+阅读 · 2017年9月24日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

基于动态参数信道模型的OFDM系统时变信道估计

国家自然科学基金

0+阅读 · 2015年12月31日

辅助性胶凝材料负载纳米碳纤维的优化设计及其与水泥基材料相互作用机理研究

国家自然科学基金

0+阅读 · 2014年12月31日

菌株Pigmentiphaga sp.H8对3,5-二溴-4-羟基苯甲酸的降解及脱溴机制

国家自然科学基金

0+阅读 · 2013年12月31日

RegIII信号通路与SOCS3甲基化协同调控胰腺炎症恶性转化的分子机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

Kupffer细胞上GITRL在大鼠肝移植免疫耐受重建中的作用研究

国家自然科学基金

0+阅读 · 2012年12月31日

针尖石墨烯纳米场效应晶体管生物传感器的研究

国家自然科学基金

0+阅读 · 2012年12月31日

Ghrelin对胰岛β细胞分泌胰岛素和增殖的影响及分子机制

国家自然科学基金

0+阅读 · 2012年12月31日

基于list-mode数据的快速SART真3D PET断层重建算法的研究

国家自然科学基金

0+阅读 · 2011年12月31日

Ter94在Hedgehog信号转导途径中的作用机理

国家自然科学基金

0+阅读 · 2009年12月31日

Lewis y抗原介导的PI3K/Akt2信号转导通路致卵巢癌多药耐药的分子机制

国家自然科学基金

0+阅读 · 2008年12月31日

Budgeting Counterfactual for Offline RL

Arxiv

0+阅读 · 2023年7月12日

Probabilistic Counterexample Guidance for Safer Reinforcement Learning (Extended Version)

Arxiv

0+阅读 · 2023年7月12日

Fast Rates for the Regret of Offline Reinforcement Learning

Arxiv

0+阅读 · 2023年7月12日

Generalization Error of First-Order Methods for Statistical Learning with Generic Oracles

Arxiv

0+阅读 · 2023年7月11日

Distributed Convex Optimization "Over-the-Air" in Dynamic Environments

Arxiv

0+阅读 · 2023年7月10日

AdaptiveRec: Adaptively Construct Pairs for Contrastive Learning in Sequential Recommendation

Arxiv

0+阅读 · 2023年7月7日

Fisher-Weighted Merge of Contrastive Learning Models in Sequential Recommendation

Arxiv

0+阅读 · 2023年7月5日

A Domain-Agnostic Approach for Characterization of Lifelong Learning Systems

Arxiv

17+阅读 · 2023年1月18日

Learning with Differentiable Algorithms

Arxiv

11+阅读 · 2022年9月1日

A Survey on Deep Transfer Learning

A Survey on Deep Transfer Learning

Arxiv

11+阅读 · 2018年8月6日

VIP会员

文章信息

相关主题

相关VIP内容

Meta最新WWW2022《联邦计算导论》教程，附77页ppt

Meta最新WWW2022《联邦计算导论》教程，附77页ppt

专知会员服务

60+阅读 · 2022年5月5日

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

126+阅读 · 2022年4月21日

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

54+阅读 · 2020年9月7日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《俄乌战争中的无人系统：新的战争方式与新兴趋势——来自前线的印象》报告

《海上自主水面船舶远程操作中心：安全可持续运行的多维度分析》

多模态大语言模型下游调优中“保持自我”的重要性

隐身自主无人水下航行器技术如何变革水下作战并重塑海军竞争

相关资讯

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

【论文】图上的表示学习综述

【论文】图上的表示学习综述

机器学习研究会

15+阅读 · 2017年9月24日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Budgeting Counterfactual for Offline RL

Arxiv

0+阅读 · 2023年7月12日

Probabilistic Counterexample Guidance for Safer Reinforcement Learning (Extended Version)

Arxiv

0+阅读 · 2023年7月12日

Fast Rates for the Regret of Offline Reinforcement Learning

Arxiv

0+阅读 · 2023年7月12日

Generalization Error of First-Order Methods for Statistical Learning with Generic Oracles

Arxiv

0+阅读 · 2023年7月11日

Distributed Convex Optimization "Over-the-Air" in Dynamic Environments

Arxiv

0+阅读 · 2023年7月10日

AdaptiveRec: Adaptively Construct Pairs for Contrastive Learning in Sequential Recommendation

Arxiv

0+阅读 · 2023年7月7日

Fisher-Weighted Merge of Contrastive Learning Models in Sequential Recommendation

Arxiv

0+阅读 · 2023年7月5日

A Domain-Agnostic Approach for Characterization of Lifelong Learning Systems

Arxiv

17+阅读 · 2023年1月18日

Learning with Differentiable Algorithms

Arxiv

11+阅读 · 2022年9月1日

A Survey on Deep Transfer Learning

A Survey on Deep Transfer Learning

Arxiv

11+阅读 · 2018年8月6日

相关基金

基于动态参数信道模型的OFDM系统时变信道估计

国家自然科学基金

0+阅读 · 2015年12月31日

辅助性胶凝材料负载纳米碳纤维的优化设计及其与水泥基材料相互作用机理研究

国家自然科学基金

0+阅读 · 2014年12月31日

菌株Pigmentiphaga sp.H8对3,5-二溴-4-羟基苯甲酸的降解及脱溴机制

国家自然科学基金

0+阅读 · 2013年12月31日

RegIII信号通路与SOCS3甲基化协同调控胰腺炎症恶性转化的分子机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

Kupffer细胞上GITRL在大鼠肝移植免疫耐受重建中的作用研究

国家自然科学基金

0+阅读 · 2012年12月31日

针尖石墨烯纳米场效应晶体管生物传感器的研究

国家自然科学基金

0+阅读 · 2012年12月31日

Ghrelin对胰岛β细胞分泌胰岛素和增殖的影响及分子机制

国家自然科学基金

0+阅读 · 2012年12月31日

基于list-mode数据的快速SART真3D PET断层重建算法的研究

国家自然科学基金

0+阅读 · 2011年12月31日

Ter94在Hedgehog信号转导途径中的作用机理

国家自然科学基金

0+阅读 · 2009年12月31日

Lewis y抗原介导的PI3K/Akt2信号转导通路致卵巢癌多药耐药的分子机制

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员