超级决策Transformer用于高效的在线策略适应 (Hyper-Decision Transformer for Efficient Online Policy Adaptation) - 专知论文

会员服务 ·

0

决策Transformer · 初始化 · 超网络 · 演示 · 微调 ·

2023 年 4 月 17 日

Hyper-Decision Transformer for Efficient Online Policy Adaptation

翻译：超级决策Transformer用于高效的在线策略适应

Mengdi Xu,Yuchen Lu,Yikang Shen,Shun Zhang,Ding Zhao,Chuang Gan

from arxiv, ICLR 2023. Project page: https://sites.google.com/view/hdtforiclr2023/home

Decision Transformers (DT) have demonstrated strong performances in offline reinforcement learning settings, but quickly adapting to unseen novel tasks remains challenging. To address this challenge, we propose a new framework, called Hyper-Decision Transformer (HDT), that can generalize to novel tasks from a handful of demonstrations in a data- and parameter-efficient manner. To achieve such a goal, we propose to augment the base DT with an adaptation module, whose parameters are initialized by a hyper-network. When encountering unseen tasks, the hyper-network takes a handful of demonstrations as inputs and initializes the adaptation module accordingly. This initialization enables HDT to efficiently adapt to novel tasks by only fine-tuning the adaptation module. We validate HDT's generalization capability on object manipulation tasks. We find that with a single expert demonstration and fine-tuning only 0.5% of DT parameters, HDT adapts faster to unseen tasks than fine-tuning the whole DT model. Finally, we explore a more challenging setting where expert actions are not available, and we show that HDT outperforms state-of-the-art baselines in terms of task success rates by a large margin.

翻译：决策Transformer（DT）在离线强化学习设置中表现出了强大的性能，但是快速适应未见过的新任务仍然具有挑战性。为了解决这一挑战，我们提出了一个新的框架，称为超级决策Transformer（HDT），它可以以数据和参数高效的方式从少量演示中推广到新任务。为了实现这样的目标，我们提出了在基本DT上增加一个适应模块的方法，其参数由超网络初始化。在遇到未见过的任务时，超网络将少量演示作为输入并相应地初始化适应模块。此初始化使得HDT能够通过仅微调适应模块来高效地适应新任务。我们在物体操作任务中验证了HDT的泛化能力。我们发现，仅使用一次专家演示并微调0.5％的DT参数，HDT比微调整个DT模型更快地适应未见过的任务。最后，我们探索了更具挑战性的情况，其中专家行动不可用，并且我们展示了HDT在任务成功率方面的表现优于最先进基线模型。

0

相关内容

决策Transformer

决策Transformer

【Google】高效Transformer综述，Efficient Transformers: A Survey

【Google】高效Transformer综述，Efficient Transformers: A Survey

专知会员服务

66+阅读 · 2022年3月17日

【KDD2020-清华大学】自适应图编码器，Adaptive Graph Encoder for Attributed Graph Embedding

【KDD2020-清华大学】自适应图编码器，Adaptive Graph Encoder for Attributed Graph Embedding

专知会员服务

99+阅读 · 2020年7月6日

【CVPR2020-Oral】无监督域内自适应语义分割，Unsupervised Intra-domain Adaptation

【CVPR2020-Oral】无监督域内自适应语义分割，Unsupervised Intra-domain Adaptation

专知会员服务

71+阅读 · 2020年4月20日

【CVPR2020-牛津大学】具有自适应邻域一致性的通信网络，Correspondence Networks with Adaptive Neighbourhood Consensus

【CVPR2020-牛津大学】具有自适应邻域一致性的通信网络，Correspondence Networks with Adaptive Neighbourhood Consensus

专知会员服务

16+阅读 · 2020年3月27日

【Google 大脑】使用上千个优化任务学习超参数搜索策略，Using a thousand optimization tasks to learn hyperparameter search strategies

【Google 大脑】使用上千个优化任务学习超参数搜索策略，Using a thousand optimization tasks to learn hyperparameter search strategies

专知会员服务

18+阅读 · 2020年3月14日

【伯克利，基于模型的强化学习：理论与实践】《Model-Based Reinforcement Learning:Theory and Practice》，Michael Janner

【伯克利，基于模型的强化学习：理论与实践】《Model-Based Reinforcement Learning:Theory and Practice》，Michael Janner

专知会员服务

35+阅读 · 2019年12月12日

【ICCV 2019 Workshop】Adaptive Confidence Smoothing for Generalized Zero-Shot Learning，巴伊兰大学 Yuval Atzmon

【ICCV 2019 Workshop】Adaptive Confidence Smoothing for Generalized Zero-Shot Learning，巴伊兰大学 Yuval Atzmon

专知会员服务

13+阅读 · 2019年10月31日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

从此告别繁琐的模型微调，LLM-Adapters助力NLP任务快速高效微调！

从此告别繁琐的模型微调，LLM-Adapters助力NLP任务快速高效微调！

PaperWeekly

2+阅读 · 2023年4月6日

【ICML2022】在线决策Transformer

【ICML2022】在线决策Transformer

专知

2+阅读 · 2022年7月27日

Gato之后，谷歌也推出「通才型」智能体Multi-Game Decision Transformers

Gato之后，谷歌也推出「通才型」智能体Multi-Game Decision Transformers

机器之心

1+阅读 · 2022年6月12日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

最大化接收工件总利益的在线排序研究

国家自然科学基金

0+阅读 · 2015年12月31日

特殊小RNA和lncRNA对干细胞染色体分离及染色质表观遗传状态的调控

国家自然科学基金

0+阅读 · 2015年12月31日

加工时间可控排序问题及依赖资源指派问题研究

国家自然科学基金

0+阅读 · 2014年12月31日

海洋环境下多无人艇协同航行机制与仿人自适应控制研究

国家自然科学基金

0+阅读 · 2013年12月31日

β-环糊精/硅基杂化多功能手性固定相的制备及其色谱性能研究

国家自然科学基金

0+阅读 · 2013年12月31日

速度和距离对三维空间中时空整合的不同影响：行为与神经证据

国家自然科学基金

0+阅读 · 2012年12月31日

大规模Job shop排序问题渐近最优算法研究

国家自然科学基金

0+阅读 · 2012年12月31日

超级电容器复合电极材料的设计合成及性能研究

国家自然科学基金

0+阅读 · 2011年12月31日

网络排序问题的高性能优化算法研究

国家自然科学基金

0+阅读 · 2011年12月31日

三维空间注意的认知神经机制

国家自然科学基金

0+阅读 · 2009年12月31日

Offline Meta Reinforcement Learning with In-Distribution Online Adaptation

Arxiv

1+阅读 · 2023年6月1日

Efficient Online Reinforcement Learning with Offline Data

Arxiv

0+阅读 · 2023年5月31日

Jointly Reparametrized Multi-Layer Adaptation for Efficient and Private Tuning

Arxiv

0+阅读 · 2023年5月30日

Prompt-based Tuning of Transformer Models for Multi-Center Medical Image Segmentation

Arxiv

1+阅读 · 2023年5月30日

Quick Adaptive Ternary Segmentation: An Efficient Decoding Procedure For Hidden Markov Models

Arxiv

0+阅读 · 2023年5月29日

Active Learning for Domain Adaptation: An Energy-based Approach

Arxiv

13+阅读 · 2021年12月2日

Adaptive Consistency Regularization for Semi-Supervised Transfer Learning

Arxiv

23+阅读 · 2021年3月3日

Adaptive Universal Generalized PageRank Graph Neural Network

Arxiv

10+阅读 · 2021年1月22日

Graph Convolutional Label Noise Cleaner: Train a Plug-and-play Action Classifier for Anomaly Detection

Graph Convolutional Label Noise Cleaner: Train a Plug-and-play Action Classifier for Anomaly Detection

Arxiv

15+阅读 · 2019年3月18日

Adaptive Correlation Filters with Long-Term and Short-Term Memory for Object Tracking

Arxiv

11+阅读 · 2018年3月23日

VIP会员

文章信息

相关主题

决策Transformer

相关VIP内容

【Google】高效Transformer综述，Efficient Transformers: A Survey

【Google】高效Transformer综述，Efficient Transformers: A Survey

专知会员服务

66+阅读 · 2022年3月17日

【KDD2020-清华大学】自适应图编码器，Adaptive Graph Encoder for Attributed Graph Embedding

【KDD2020-清华大学】自适应图编码器，Adaptive Graph Encoder for Attributed Graph Embedding

专知会员服务

99+阅读 · 2020年7月6日

【CVPR2020-Oral】无监督域内自适应语义分割，Unsupervised Intra-domain Adaptation

【CVPR2020-Oral】无监督域内自适应语义分割，Unsupervised Intra-domain Adaptation

专知会员服务

71+阅读 · 2020年4月20日

【CVPR2020-牛津大学】具有自适应邻域一致性的通信网络，Correspondence Networks with Adaptive Neighbourhood Consensus

【CVPR2020-牛津大学】具有自适应邻域一致性的通信网络，Correspondence Networks with Adaptive Neighbourhood Consensus

专知会员服务

16+阅读 · 2020年3月27日

【Google 大脑】使用上千个优化任务学习超参数搜索策略，Using a thousand optimization tasks to learn hyperparameter search strategies

【Google 大脑】使用上千个优化任务学习超参数搜索策略，Using a thousand optimization tasks to learn hyperparameter search strategies

专知会员服务

18+阅读 · 2020年3月14日

【伯克利，基于模型的强化学习：理论与实践】《Model-Based Reinforcement Learning:Theory and Practice》，Michael Janner

【伯克利，基于模型的强化学习：理论与实践】《Model-Based Reinforcement Learning:Theory and Practice》，Michael Janner

专知会员服务

35+阅读 · 2019年12月12日

【ICCV 2019 Workshop】Adaptive Confidence Smoothing for Generalized Zero-Shot Learning，巴伊兰大学 Yuval Atzmon

【ICCV 2019 Workshop】Adaptive Confidence Smoothing for Generalized Zero-Shot Learning，巴伊兰大学 Yuval Atzmon

专知会员服务

13+阅读 · 2019年10月31日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《乌克兰无人机产业：志愿者与政策在构建新兴无人机产业中的协同作用》最新报告

《人工智能辅助决策中的数据可视化：系统性综述》

人工智能驱动弹药制造现代化：美国陆军转型之路

《敏捷作战部署中枢纽-辐条基地选址优化研究》80页

相关资讯

从此告别繁琐的模型微调，LLM-Adapters助力NLP任务快速高效微调！

从此告别繁琐的模型微调，LLM-Adapters助力NLP任务快速高效微调！

PaperWeekly

2+阅读 · 2023年4月6日

【ICML2022】在线决策Transformer

【ICML2022】在线决策Transformer

专知

2+阅读 · 2022年7月27日

Gato之后，谷歌也推出「通才型」智能体Multi-Game Decision Transformers

Gato之后，谷歌也推出「通才型」智能体Multi-Game Decision Transformers

机器之心

1+阅读 · 2022年6月12日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Offline Meta Reinforcement Learning with In-Distribution Online Adaptation

Arxiv

1+阅读 · 2023年6月1日

Efficient Online Reinforcement Learning with Offline Data

Arxiv

0+阅读 · 2023年5月31日

Jointly Reparametrized Multi-Layer Adaptation for Efficient and Private Tuning

Arxiv

0+阅读 · 2023年5月30日

Prompt-based Tuning of Transformer Models for Multi-Center Medical Image Segmentation

Arxiv

1+阅读 · 2023年5月30日

Quick Adaptive Ternary Segmentation: An Efficient Decoding Procedure For Hidden Markov Models

Arxiv

0+阅读 · 2023年5月29日

Active Learning for Domain Adaptation: An Energy-based Approach

Arxiv

13+阅读 · 2021年12月2日

Adaptive Consistency Regularization for Semi-Supervised Transfer Learning

Arxiv

23+阅读 · 2021年3月3日

Adaptive Universal Generalized PageRank Graph Neural Network

Arxiv

10+阅读 · 2021年1月22日

Graph Convolutional Label Noise Cleaner: Train a Plug-and-play Action Classifier for Anomaly Detection

Graph Convolutional Label Noise Cleaner: Train a Plug-and-play Action Classifier for Anomaly Detection

Arxiv

15+阅读 · 2019年3月18日

Adaptive Correlation Filters with Long-Term and Short-Term Memory for Object Tracking

Arxiv

11+阅读 · 2018年3月23日

相关基金

最大化接收工件总利益的在线排序研究

国家自然科学基金

0+阅读 · 2015年12月31日

特殊小RNA和lncRNA对干细胞染色体分离及染色质表观遗传状态的调控

国家自然科学基金

0+阅读 · 2015年12月31日

加工时间可控排序问题及依赖资源指派问题研究

国家自然科学基金

0+阅读 · 2014年12月31日

海洋环境下多无人艇协同航行机制与仿人自适应控制研究

国家自然科学基金

0+阅读 · 2013年12月31日

β-环糊精/硅基杂化多功能手性固定相的制备及其色谱性能研究

国家自然科学基金

0+阅读 · 2013年12月31日

速度和距离对三维空间中时空整合的不同影响：行为与神经证据

国家自然科学基金

0+阅读 · 2012年12月31日

大规模Job shop排序问题渐近最优算法研究

国家自然科学基金

0+阅读 · 2012年12月31日

超级电容器复合电极材料的设计合成及性能研究

国家自然科学基金

0+阅读 · 2011年12月31日

网络排序问题的高性能优化算法研究

国家自然科学基金

0+阅读 · 2011年12月31日

三维空间注意的认知神经机制

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员