There has recently been significant interest in training reinforcement learning (RL) agents in vision-based environments. This poses many challenges, such as high dimensionality and the potential for observational overfitting through spurious correlations. A promising approach to solve both of these problems is an attention bottleneck, which provides a simple and effective framework for learning high performing policies, even in the presence of distractions. However, due to poor scalability of attention architectures, these methods cannot be applied beyond low resolution visual inputs, using large patches (thus small attention matrices). In this paper we make use of new efficient attention algorithms, recently shown to be highly effective for Transformers, and demonstrate that these techniques can be successfully adopted for the RL setting. This allows our attention-based controllers to scale to larger visual inputs, and facilitate the use of smaller patches, even individual pixels, improving generalization. We show this on a range of tasks from the Distracting Control Suite to vision-based quadruped robots locomotion. We provide rigorous theoretical analysis of the proposed algorithm.
翻译:最近人们对在基于视觉的环境中培训强化学习(RL)剂的兴趣很大,这带来了许多挑战,例如高度的维度和通过虚假的关联进行观测超度的可能性。解决这两个问题的一个有希望的方法是注意力瓶颈,它为学习高性能政策提供了一个简单有效的框架,即便在有分心的情况下也是如此。然而,由于关注结构的可缩放性差,这些方法无法应用到低分辨率的视觉投入之外,使用大型补丁(如此小的注意矩阵 ) 。在本文中,我们使用了新的高效关注算法,最近显示,这些算法对变异器非常有效,并表明这些技术可以成功地用于变异器设置。这使得我们关注的控制器能够扩大视觉投入,便利使用较小的补丁,甚至个别的像素,改进了一般化。我们用解析控制套件到基于视觉的四分立的机器人 Locomotion等一系列任务展示了这一点。我们对拟议的算法进行了严格的理论分析。