In a partially observable Markov decision process (POMDP), an agent typically uses a representation of the past to approximate the underlying MDP. We propose to utilize a frozen Pretrained Language Transformer (PLT) for history representation and compression to improve sample efficiency. To avoid training of the Transformer, we introduce FrozenHopfield, which automatically associates observations with original token embeddings. To form these associations, a modern Hopfield network stores the original token embeddings, which are retrieved by queries that are obtained by a random but fixed projection of observations. Our new method, HELM, enables actor-critic network architectures that contain a pretrained language Transformer for history representation as a memory module. Since a representation of the past need not be learned, HELM is much more sample efficient than competitors. On Minigrid and Procgen environments HELM achieves new state-of-the-art results. Our code is available at https://github.com/ml-jku/helm.
翻译:在部分可见的Markov决策程序中,一个代理机构通常使用过去代表来接近基本MDP。我们提议使用一个冷冻的预先语言变换器(PLT)来代表历史和压缩,以提高样本效率。为了避免对变换器的培训,我们引入了FrozenHopfield, 它自动将观测与原始象征性嵌入器联系起来。为了形成这些协会,现代Hopfield网络存储了最初的象征性嵌入器,这些嵌入器通过随机但固定的观测预测获得。我们的新方法,即HELM(HELM)(HELM)(Heltic-critical Network)(HELM)(HELM)(HELM)(HEL-C)(Heltic)(Help-critical Inform-form-The-Art-ress)。我们的代码可以在https://github.com/ml-jku/helm/helm上查阅。