变异器是抽样高效世界模型 (Transformers are Sample Efficient World Models) - 专知论文

会员服务 ·

0

Learning · Agent · MoDELS · 变换 · Iris (数据集) ·

2022 年 9 月 1 日

Transformers are Sample Efficient World Models

翻译：变异器是抽样高效世界模型

Vincent Micheli,Eloi Alonso,François Fleuret

Deep reinforcement learning agents are notoriously sample inefficient, which considerably limits their application to real-world problems. Recently, many model-based methods have been designed to address this issue, with learning in the imagination of a world model being one of the most prominent approaches. However, while virtually unlimited interaction with a simulated environment sounds appealing, the world model has to be accurate over extended periods of time. Motivated by the success of Transformers in sequence modeling tasks, we introduce IRIS, a data-efficient agent that learns in a world model composed of a discrete autoencoder and an autoregressive Transformer. With the equivalent of only two hours of gameplay in the Atari 100k benchmark, IRIS achieves a mean human normalized score of 1.046, and outperforms humans on 10 out of 26 games. Our approach sets a new state of the art for methods without lookahead search, and even surpasses MuZero. To foster future research on Transformers and world models for sample-efficient reinforcement learning, we release our codebase at https://github.com/eloialonso/iris.

翻译：深层强化学习代理机构明显缺乏效率,严重限制了它们应用于现实世界的问题。最近,设计了许多基于模型的方法来解决这一问题,其中最突出的方法之一是以世界模型的想象力学习世界模型。然而,虽然与模拟环境的无限制互动听起来很吸引人,但世界模型必须长期准确。受变换者在序列建模任务中的成功推动,我们引入了IRIS,这是一个数据高效的代理机构,在由离散自动编码器和自动递增变异器组成的世界模型中学习。在Atari 100k基准中,IRIS只实现了相当于两个小时的游戏游戏,相当于1046的人类平均标准分,在26个游戏中,比人差10个游戏。我们的方法为不进行外头搜索,甚至超过Muzero的方法确立了新的艺术状态。为了促进未来对变换器和世界样本高效强化学习模型的研究,我们在 https://github.com/eloialonso/iris 上公布了我们的代码库库库。

0

相关内容

Learning

自然语言处理顶会NAACL2022最佳论文出炉！

自然语言处理顶会NAACL2022最佳论文出炉！

专知会员服务

43+阅读 · 2022年6月30日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

MARVELD1基因调控肝细胞癌介入治疗的机制研究

国家自然科学基金

0+阅读 · 2016年12月31日

煤粉非均相MILD燃烧及燃料型NOx生成机理研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于定子电磁振动与电流分析的永磁同步电机电气及失磁故障诊断方法研究

国家自然科学基金

0+阅读 · 2014年12月31日

SIRT1对早老综合症RTS致病蛋白RecQ4的功能调节

国家自然科学基金

0+阅读 · 2014年12月31日

四倍体马铃薯SNP遗传图谱的构建及重要农艺性状的关联分析

国家自然科学基金

0+阅读 · 2014年12月31日

蓖麻性别表现差异的分子机理

国家自然科学基金

0+阅读 · 2012年12月31日

关于AI-半环簇与 Conway半环簇的研究

国家自然科学基金

1+阅读 · 2012年12月31日

深海放线菌Streptomyces sp. SCSIO 03032抗肿瘤天然产物Spiroindimicins生物合成研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于WRF模式系统的InSAR大气校正方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于转录组测序筛选马铃薯块茎休眠与发芽相关基因及功能鉴定

国家自然科学基金

0+阅读 · 2011年12月31日

Generalizing in the Real World with Representation Learning

Arxiv

0+阅读 · 2022年10月18日

Efficient Modeling of Future Context for Image Captioning

Arxiv

0+阅读 · 2022年10月18日

Planning for Sample Efficient Imitation Learning

Arxiv

0+阅读 · 2022年10月18日

Simple Pooling Front-ends For Efficient Audio Classification

Arxiv

0+阅读 · 2022年10月16日

Uplift and Upsample: Efficient 3D Human Pose Estimation with Uplifting Transformers

Arxiv

0+阅读 · 2022年10月14日

DyLoRA: Parameter Efficient Tuning of Pre-trained Models using Dynamic Search-Free Low-Rank Adaptation

Arxiv

0+阅读 · 2022年10月14日

Transformers are Meta-Reinforcement Learners

Arxiv

15+阅读 · 2022年6月14日

Efficient Transformers: A Survey

Arxiv

23+阅读 · 2020年9月16日

Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers

Arxiv

12+阅读 · 2020年6月23日

Data Augmentation using Pre-trained Transformer Models

Arxiv

17+阅读 · 2020年3月4日

VIP会员

文章信息

相关主题

Iris (数据集)

相关VIP内容

自然语言处理顶会NAACL2022最佳论文出炉！

自然语言处理顶会NAACL2022最佳论文出炉！

专知会员服务

43+阅读 · 2022年6月30日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

【CMU博士论文】数据驱动决策中的激励、信息与不确定性

DGP双粒度提示框架：图增强大模型助力欺诈检测

【ICCV2025】ESSENTIAL：用于视频类增量学习的情景记忆与语义记忆整合

唯快不破：大型语言模型高效架构综述

相关资讯

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Generalizing in the Real World with Representation Learning

Arxiv

0+阅读 · 2022年10月18日

Efficient Modeling of Future Context for Image Captioning

Arxiv

0+阅读 · 2022年10月18日

Planning for Sample Efficient Imitation Learning

Arxiv

0+阅读 · 2022年10月18日

Simple Pooling Front-ends For Efficient Audio Classification

Arxiv

0+阅读 · 2022年10月16日

Uplift and Upsample: Efficient 3D Human Pose Estimation with Uplifting Transformers

Arxiv

0+阅读 · 2022年10月14日

DyLoRA: Parameter Efficient Tuning of Pre-trained Models using Dynamic Search-Free Low-Rank Adaptation

Arxiv

0+阅读 · 2022年10月14日

Transformers are Meta-Reinforcement Learners

Arxiv

15+阅读 · 2022年6月14日

Efficient Transformers: A Survey

Arxiv

23+阅读 · 2020年9月16日

Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers

Arxiv

12+阅读 · 2020年6月23日

Data Augmentation using Pre-trained Transformer Models

Arxiv

17+阅读 · 2020年3月4日

相关基金

MARVELD1基因调控肝细胞癌介入治疗的机制研究

国家自然科学基金

0+阅读 · 2016年12月31日

煤粉非均相MILD燃烧及燃料型NOx生成机理研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于定子电磁振动与电流分析的永磁同步电机电气及失磁故障诊断方法研究

国家自然科学基金

0+阅读 · 2014年12月31日

SIRT1对早老综合症RTS致病蛋白RecQ4的功能调节

国家自然科学基金

0+阅读 · 2014年12月31日

四倍体马铃薯SNP遗传图谱的构建及重要农艺性状的关联分析

国家自然科学基金

0+阅读 · 2014年12月31日

蓖麻性别表现差异的分子机理

国家自然科学基金

0+阅读 · 2012年12月31日

关于AI-半环簇与 Conway半环簇的研究

国家自然科学基金

1+阅读 · 2012年12月31日

深海放线菌Streptomyces sp. SCSIO 03032抗肿瘤天然产物Spiroindimicins生物合成研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于WRF模式系统的InSAR大气校正方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于转录组测序筛选马铃薯块茎休眠与发芽相关基因及功能鉴定

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员