Meta reinforcement learning (RL) attempts to discover new RL algorithms automatically from environment interaction. In so-called black-box approaches, the policy and the learning algorithm are jointly represented by a single neural network. These methods are very flexible, but they tend to underperform in terms of generalisation to new, unseen environments. In this paper, we explore the role of symmetries in meta-generalisation. We show that a recent successful meta RL approach that meta-learns an objective for backpropagation-based learning exhibits certain symmetries (specifically the reuse of the learning rule, and invariance to input and output permutations) that are not present in typical black-box meta RL systems. We hypothesise that these symmetries can play an important role in meta-generalisation. Building off recent work in black-box supervised meta learning, we develop a black-box meta RL system that exhibits these same symmetries. We show through careful experimentation that incorporating these symmetries can lead to algorithms with a greater ability to generalise to unseen action & observation spaces, tasks, and environments.
翻译:元加强学习( RL) 试图从环境互动中自动发现新的 RL 算法。 在所谓的黑盒方法中, 政策和学习算法由单一神经网络共同代表。 这些方法非常灵活, 但往往在向新的、 看不见的环境的概括化方面表现不佳。 在本文中, 我们探索对称在元化中的作用 。 我们展示了最近一个成功的 Met- Lear 方法, 该方法使Met- lears 成为了基于后向分析的学习目标 。 在所谓的黑盒方法中, 政策和学习算法由单一的神经网络共同代表。 这些方法非常灵活, 但是这些方法往往在向新的、 看不见的环境 。 我们假设这些对称这些对称可以在元化中发挥重要作用 。 在黑盒监管的元学习中建立最近的工作, 我们开发了一个黑盒 元RL 系统, 来展示同样的对称。 我们通过仔细的实验, 将这些对称法结合这些对称可以导致更有能力向不可见的行动和观察空间、 任务以及环境 。