通过隐含注意解锁强化学习像素 (Unlocking Pixels for Reinforcement Learning via Implicit Attention)

Krzysztof Marcin Choromanski,Deepali Jain,Wenhao Yu,Xingyou Song,Jack Parker-Holder,Tingnan Zhang,Valerii Likhosherstov,Aldo Pacchiano,Anirban Santara,Yunhao Tang,Jie Tan,Adrian Weller

There has recently been significant interest in training reinforcement learning (RL) agents in vision-based environments. This poses many challenges, such as high dimensionality and the potential for observational overfitting through spurious correlations. A promising approach to solve both of these problems is an attention bottleneck, which provides a simple and effective framework for learning high performing policies, even in the presence of distractions. However, due to poor scalability of attention architectures, these methods cannot be applied beyond low resolution visual inputs, using large patches (thus small attention matrices). In this paper we make use of new efficient attention algorithms, recently shown to be highly effective for Transformers, and demonstrate that these techniques can be successfully adopted for the RL setting. This allows our attention-based controllers to scale to larger visual inputs, and facilitate the use of smaller patches, even individual pixels, improving generalization. We show this on a range of tasks from the Distracting Control Suite to vision-based quadruped robots locomotion. We provide rigorous theoretical analysis of the proposed algorithm.

翻译：最近人们对在基于视觉的环境中培训强化学习(RL)剂的兴趣很大,这带来了许多挑战,例如高度的维度和通过虚假的关联进行观测超度的可能性。解决这两个问题的一个有希望的方法是注意力瓶颈,它为学习高性能政策提供了一个简单有效的框架,即便在有分心的情况下也是如此。然而,由于关注结构的可缩放性差,这些方法无法应用到低分辨率的视觉投入之外,使用大型补丁(如此小的注意矩阵 ) 。在本文中,我们使用了新的高效关注算法,最近显示,这些算法对变异器非常有效,并表明这些技术可以成功地用于变异器设置。这使得我们关注的控制器能够扩大视觉投入,便利使用较小的补丁,甚至个别的像素,改进了一般化。我们用解析控制套件到基于视觉的四分立的机器人 Locomotion等一系列任务展示了这一点。我们对拟议的算法进行了严格的理论分析。

相关内容

注意力机制

关注 120

Attention机制最早是在视觉图像领域提出来的，但是真正火起来应该算是google mind团队的这篇论文《Recurrent Models of Visual Attention》[14]，他们在RNN模型上使用了attention机制来进行图像分类。随后，Bahdanau等人在论文《Neural Machine Translation by Jointly Learning to Align and Translate》 [1]中，使用类似attention的机制在机器翻译任务上将翻译和对齐同时进行，他们的工作算是是第一个提出attention机制应用到NLP领域中。接着类似的基于attention机制的RNN模型扩展开始应用到各种NLP任务中。最近，如何在CNN中使用attention机制也成为了大家的研究热点。下图表示了attention研究进展的大概趋势。

【DeepMind】基于模型的强化学习，174页ppt，Model-Based Reinforcement Learning

专知会员服务

89+阅读 · 2021年1月12日

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

专知会员服务

41+阅读 · 2020年4月11日

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

专知会员服务

84+阅读 · 2020年2月18日