Meta-learning is a line of research that develops the ability to leverage past experiences to efficiently solve new learning problems. Meta-Reinforcement Learning (meta-RL) methods demonstrate a capability to learn behaviors that efficiently acquire and exploit information in several meta-RL problems. In this context, the Alchemy benchmark has been proposed by Wang et al. [2021]. Alchemy features a rich structured latent space that is challenging for state-of-the-art model-free RL methods. These methods fail to learn to properly explore then exploit. We develop a model-based algorithm. We train a model whose principal block is a Transformer Encoder to fit the symbolic Alchemy environment dynamics. Then we define an online planner with the learned model using a tree search method. This algorithm significantly outperforms previously applied model-free RL methods on the symbolic Alchemy problem. Our results reveal the relevance of model-based approaches with online planning to perform exploration and exploitation successfully in meta-RL. Moreover, we show the efficiency of the Transformer architecture to learn complex dynamics that arise from latent spaces present in meta-RL problems.
翻译:元学习是发展利用过去的经验来有效解决新的学习问题的能力的一线研究。元加强学习(meta-RL)方法展示了学习在几个元RL问题中有效获取和利用信息的行为的能力。在这方面,王等人(2021年)提出了Alchemy基准。Alchemy具有一个结构丰富的潜在空间,对最先进的无模型RL方法具有挑战性。这些方法未能学会适当探索然后加以利用。我们开发了一个基于模型的算法。我们培训了一个模型,其主要块是变换器电解码器,以适应象征性的 Alchemy 环境动态。然后我们用树搜索方法与所学的模型定义一个在线规划器。这一算法大大超越了以前对象征性的Alchemy问题应用的无模型RL方法。我们的结果揭示了模型方法与网上规划的相关性,以便在元RL中成功进行探索和利用。此外,我们展示了变换器结构在学习从元-RL问题中隐蔽空间产生的复杂动态方面的效率。