Much of recent Deep Reinforcement Learning success is owed to the neural architecture's potential to learn and use effective internal representations of the world. While many current algorithms access a simulator to train with a large amount of data, in realistic settings, including while playing games that may be played against people, collecting experience can be quite costly. In this paper, we introduce a deep reinforcement learning architecture whose purpose is to increase sample efficiency without sacrificing performance. We design this architecture by incorporating advances achieved in recent years in the field of Natural Language Processing and Computer Vision. Specifically, we propose a visually attentive model that uses transformers to learn a self-attention mechanism on the feature maps of the state representation, while simultaneously optimizing return. We demonstrate empirically that this architecture improves sample complexity for several Atari environments, while also achieving better performance in some of the games.
翻译:最近深强化学习的成功在很大程度上归功于神经结构在学习和使用有效的世界内部表现方面的潜力。虽然许多目前的算法在现实环境中,包括在玩可能危害人的游戏时,在现实环境中利用模拟器进行大量数据的培训,但收集经验的费用可能相当高。在本文中,我们引入了一个深强化学习结构,目的是提高抽样效率,同时不牺牲业绩。我们通过纳入近年来在自然语言处理和计算机视野领域取得的进展来设计这一结构。具体地说,我们提出了一个视觉关注模型,利用变压器学习关于国家代表性特征图的自我注意机制,同时优化回报。我们从经验上表明,这一结构提高了一些阿塔里环境的抽样复杂性,同时在一些游戏中取得了更好的表现。