变换器是元加强学习者 (Transformers are Meta-Reinforcement Learners) - 专知论文

会员服务 ·

0

Learning · Agent · 变换 · 讲稿 · 学习器 ·

2022 年 6 月 14 日

Transformers are Meta-Reinforcement Learners

翻译：变换器是元加强学习者

Luckeciano C. Melo

from arxiv, Published at the International Conference on Machine Learning (ICML) 2022

The transformer architecture and variants presented remarkable success across many machine learning tasks in recent years. This success is intrinsically related to the capability of handling long sequences and the presence of context-dependent weights from the attention mechanism. We argue that these capabilities suit the central role of a Meta-Reinforcement Learning algorithm. Indeed, a meta-RL agent needs to infer the task from a sequence of trajectories. Furthermore, it requires a fast adaptation strategy to adapt its policy for a new task -- which can be achieved using the self-attention mechanism. In this work, we present TrMRL (Transformers for Meta-Reinforcement Learning), a meta-RL agent that mimics the memory reinstatement mechanism using the transformer architecture. It associates the recent past of working memories to build an episodic memory recursively through the transformer layers. We show that the self-attention computes a consensus representation that minimizes the Bayes Risk at each layer and provides meaningful features to compute the best actions. We conducted experiments in high-dimensional continuous control environments for locomotion and dexterous manipulation. Results show that TrMRL presents comparable or superior asymptotic performance, sample efficiency, and out-of-distribution generalization compared to the baselines in these environments.

翻译：变压器架构和变异器近年来在许多机器学习任务中表现出了显著的成功。这一成功与处理长序列的能力以及关注机制中存在基于背景的权重有着内在的联系。我们认为,这些能力适合元增强学习算法的核心作用。事实上, 元RL 代理器需要从一系列轨迹中推断出任务。此外, 它需要快速适应战略来调整其政策以适应一项新任务 -- -- 可以通过自省机制实现。在这项工作中, 我们展示了TrMRL(Met-Reginment Learnings for Meta-Regining), 这是一种利用变压器结构模拟记忆恢复机制的元RL 代理器。它将最近的工作记忆记录与通过变压层循环来建立感性记忆。我们显示,自我保护是一种共识代表,可以最大限度地减少每一层的巴伊斯风险,并为计算最佳行动提供有意义的特征。我们在高维度连续控制环境中进行实验, 以移动和变压方式学习, 并展示了这些变压的测试性环境。结果显示, TMRML 的基线显示, 具有可比较性环境。

15

相关内容

Learning

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

专知会员服务

23+阅读 · 2022年3月19日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【强化学习资源集合】Awesome Reinforcement Learning

【强化学习资源集合】Awesome Reinforcement Learning

专知会员服务

97+阅读 · 2019年12月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【新书发布】原作者MarcG.Bellemare发布315页分布强化学习书籍(DistributionalRL)

【新书发布】原作者MarcG.Bellemare发布315页分布强化学习书籍(DistributionalRL)

深度强化学习实验室

1+阅读 · 2022年1月11日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

专知

13+阅读 · 2018年6月24日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

M1胆碱受体对AMPA受体GluA1亚基的调控及其在突触长时程增强和学习记忆中的作用及机制

国家自然科学基金

0+阅读 · 2014年12月31日

暖白光LED用低光衰高显色性Lu3Al5-x(Si/B)xO12-yNy:Ce荧光粉的研究

国家自然科学基金

0+阅读 · 2014年12月31日

大鼠浅筋膜中腺苷受体与针刺镇痛相关性研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于SURE/PURE准则的图像盲反卷积算法研究

国家自然科学基金

3+阅读 · 2013年12月31日

MrgC受体调控慢性吗啡诱发神经胶质细胞激活的研究

国家自然科学基金

0+阅读 · 2012年12月31日

EphB受体信号介导的神经干细胞分化机制

国家自然科学基金

0+阅读 · 2012年12月31日

谷氨酸受体在酒精依赖大鼠冲动性行为中的作用机制

国家自然科学基金

0+阅读 · 2012年12月31日

顺序激活RARα及TrkB受体介导信号的协同机制对成体急性脊髓损伤神经再生修复的实验研究

国家自然科学基金

0+阅读 · 2012年12月31日

伏隔核多巴胺D3受体与NMDA受体相互作用参与吗啡行为敏化的分子机制

国家自然科学基金

0+阅读 · 2012年12月31日

BMPR1b受体显负性过表达调控神经干细胞分化修复脊髓损伤的研究

国家自然科学基金

0+阅读 · 2009年12月31日

A Maintenance Planning Framework using Online and Offline Deep Reinforcement Learning

Arxiv

0+阅读 · 2022年8月1日

Sampling Attacks on Meta Reinforcement Learning: A Minimax Formulation and Complexity Analysis

Arxiv

1+阅读 · 2022年7月29日

Meta Reinforcement Learning with Successor Feature Based Context

Meta Reinforcement Learning with Successor Feature Based Context

Arxiv

1+阅读 · 2022年7月29日

Cloud-Edge Training Architecture for Sim-to-Real Deep Reinforcement Learning

Arxiv

0+阅读 · 2022年7月28日

Reinforcement Learning on Graph: A Survey

Arxiv

67+阅读 · 2022年4月13日

MetaCURE: Meta Reinforcement Learning with Empowerment-Driven Exploration

Arxiv

12+阅读 · 2021年2月7日

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Arxiv

26+阅读 · 2020年2月10日

Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning

Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning

Arxiv

34+阅读 · 2019年10月24日

Learning Heuristics over Large Graphs via Deep Reinforcement Learning

Arxiv

12+阅读 · 2019年3月8日

A Multi-Objective Deep Reinforcement Learning Framework

A Multi-Objective Deep Reinforcement Learning Framework

Arxiv

16+阅读 · 2018年6月27日

VIP会员

文章信息

相关主题

相关VIP内容

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

专知会员服务

23+阅读 · 2022年3月19日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【强化学习资源集合】Awesome Reinforcement Learning

【强化学习资源集合】Awesome Reinforcement Learning

专知会员服务

97+阅读 · 2019年12月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《敏捷作战部署中枢纽-辐条基地选址优化研究》80页

《大型语言模型能否有效生成基于博弈论的网络安全场景？》

《乌克兰无人机产业：志愿者与政策在构建新兴无人机产业中的协同作用》最新报告

《人工智能辅助决策中的数据可视化：系统性综述》

相关资讯

【新书发布】原作者MarcG.Bellemare发布315页分布强化学习书籍(DistributionalRL)

【新书发布】原作者MarcG.Bellemare发布315页分布强化学习书籍(DistributionalRL)

深度强化学习实验室

1+阅读 · 2022年1月11日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

专知

13+阅读 · 2018年6月24日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

A Maintenance Planning Framework using Online and Offline Deep Reinforcement Learning

Arxiv

0+阅读 · 2022年8月1日

Sampling Attacks on Meta Reinforcement Learning: A Minimax Formulation and Complexity Analysis

Arxiv

1+阅读 · 2022年7月29日

Meta Reinforcement Learning with Successor Feature Based Context

Meta Reinforcement Learning with Successor Feature Based Context

Arxiv

1+阅读 · 2022年7月29日

Cloud-Edge Training Architecture for Sim-to-Real Deep Reinforcement Learning

Arxiv

0+阅读 · 2022年7月28日

Reinforcement Learning on Graph: A Survey

Arxiv

67+阅读 · 2022年4月13日

MetaCURE: Meta Reinforcement Learning with Empowerment-Driven Exploration

Arxiv

12+阅读 · 2021年2月7日

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Arxiv

26+阅读 · 2020年2月10日

Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning

Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning

Arxiv

34+阅读 · 2019年10月24日

Learning Heuristics over Large Graphs via Deep Reinforcement Learning

Arxiv

12+阅读 · 2019年3月8日

A Multi-Objective Deep Reinforcement Learning Framework

A Multi-Objective Deep Reinforcement Learning Framework

Arxiv

16+阅读 · 2018年6月27日

相关基金

M1胆碱受体对AMPA受体GluA1亚基的调控及其在突触长时程增强和学习记忆中的作用及机制

国家自然科学基金

0+阅读 · 2014年12月31日

暖白光LED用低光衰高显色性Lu3Al5-x(Si/B)xO12-yNy:Ce荧光粉的研究

国家自然科学基金

0+阅读 · 2014年12月31日

大鼠浅筋膜中腺苷受体与针刺镇痛相关性研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于SURE/PURE准则的图像盲反卷积算法研究

国家自然科学基金

3+阅读 · 2013年12月31日

MrgC受体调控慢性吗啡诱发神经胶质细胞激活的研究

国家自然科学基金

0+阅读 · 2012年12月31日

EphB受体信号介导的神经干细胞分化机制

国家自然科学基金

0+阅读 · 2012年12月31日

谷氨酸受体在酒精依赖大鼠冲动性行为中的作用机制

国家自然科学基金

0+阅读 · 2012年12月31日

顺序激活RARα及TrkB受体介导信号的协同机制对成体急性脊髓损伤神经再生修复的实验研究

国家自然科学基金

0+阅读 · 2012年12月31日

伏隔核多巴胺D3受体与NMDA受体相互作用参与吗啡行为敏化的分子机制

国家自然科学基金

0+阅读 · 2012年12月31日

BMPR1b受体显负性过表达调控神经干细胞分化修复脊髓损伤的研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员