与 Swin 变换器进行深强化学习 (Deep Reinforcement Learning with Swin Transformer)

Transformers are neural network models that utilize multiple layers of self-attention heads. Attention is implemented in transformers as the contextual embeddings of the 'key' and 'query'. Transformers allow the re-combination of attention information from different layers and the processing of all inputs at once, which are more convenient than recurrent neural networks when dealt with a large number of data. Transformers have exhibited great performances on natural language processing tasks in recent years. Meanwhile, there have been tremendous efforts to adapt transformers into other fields of machine learning, such as Swin Transformer and Decision Transformer. Swin Transformer is a promising neural network architecture that splits image pixels into small patches and applies local self-attention operations inside the (shifted) windows of fixed sizes. Decision Transformer has successfully applied transformers to off-line reinforcement learning and showed that random-walk samples from Atari games are sufficient to let an agent learn optimized behaviors. However, it is considerably more challenging to combine online reinforcement learning with transformers. In this article, we further explore the possibility of not modifying the reinforcement learning policy, but only replacing the convolutional neural network architecture with the self-attention architecture from Swin Transformer. Namely, we target at changing how an agent views the world, but not how an agent plans about the world. We conduct our experiment on 49 games in Arcade Learning Environment. The results show that using Swin Transformer in reinforcement learning achieves significantly higher evaluation scores across the majority of games in Arcade Learning Environment. Thus, we conclude that online reinforcement learning can benefit from exploiting self-attentions with spatial token embeddings.

翻译：变换器是使用多层自我注意头的神经网络模型。变换器作为“ 键盘” 和“ query” 的背景嵌入器在变换器中引起注意。变换器可以将不同层的注意力信息重新组合, 并同时处理所有输入, 这比处理大量数据时经常出现的神经网络更方便。变换器近年来在自然语言处理任务上表现出了巨大的表现。与此同时, 已经做出了巨大的努力, 将变换器改造成其他机器学习领域, 如 Swin 变换器和决定变换器。 Swin 变换器是一个很有希望的神经网络结构, 将图像变换成小补接合器, 并在固定大小的( 变换式) 窗口中应用本地的自我注意操作。决定变换器成功地应用变换器进行脱线的学习, 并表明阿塔里游戏的随机行样样本足以让代理学习最优化的行为。但是, 将在线变换变换器和变换器等的变换器。在这个文章中, 我们进一步探索了一种可能性, 不修改强化变换变换机的自我变换的自我学习策略政策,, 而不是改变的自我变换的自我变换的自我变换过程, 我们在游戏的游戏的游戏的游戏中, 变换的变换的游戏中, 变换的变换的游戏的游戏的游戏的游戏的游戏的游戏的游戏的游戏的游戏的游戏的游戏的变换方法, 我们的变换方法, 我们的变换方法, 我们的变换的变换的变换的变换方法, 我们的变换的变换的变换的变换的变换方法, 我们的变换的变换的变换的变的变的变的变换的变换的变换的变的变的变的变的变的变的变的变的变的变的变的变的变的变的变的变换的变换方法, 我们的变的变的变的变的变的变的变的变换的变的变的变的变的变的变的变的变的变的变的变的变的变的变的变的变的变的变的变的变的变的变的变的变的变