在黑盒元加强学习中引入对称性 (Introducing Symmetries to Black Box Meta Reinforcement Learning)

Meta reinforcement learning (RL) attempts to discover new RL algorithms automatically from environment interaction. In so-called black-box approaches, the policy and the learning algorithm are jointly represented by a single neural network. These methods are very flexible, but they tend to underperform in terms of generalisation to new, unseen environments. In this paper, we explore the role of symmetries in meta-generalisation. We show that a recent successful meta RL approach that meta-learns an objective for backpropagation-based learning exhibits certain symmetries (specifically the reuse of the learning rule, and invariance to input and output permutations) that are not present in typical black-box meta RL systems. We hypothesise that these symmetries can play an important role in meta-generalisation. Building off recent work in black-box supervised meta learning, we develop a black-box meta RL system that exhibits these same symmetries. We show through careful experimentation that incorporating these symmetries can lead to algorithms with a greater ability to generalise to unseen action & observation spaces, tasks, and environments.

翻译：元加强学习( RL) 试图从环境互动中自动发现新的 RL 算法。在所谓的黑盒方法中, 政策和学习算法由单一神经网络共同代表。这些方法非常灵活, 但往往在向新的、看不见的环境的概括化方面表现不佳。在本文中, 我们探索对称在元化中的作用。我们展示了最近一个成功的 Met- Lear 方法, 该方法使Met- lears 成为了基于后向分析的学习目标。在所谓的黑盒方法中, 政策和学习算法由单一的神经网络共同代表。这些方法非常灵活, 但是这些方法往往在向新的、看不见的环境。我们假设这些对称这些对称可以在元化中发挥重要作用。在黑盒监管的元学习中建立最近的工作, 我们开发了一个黑盒元RL 系统, 来展示同样的对称。我们通过仔细的实验, 将这些对称法结合这些对称可以导致更有能力向不可见的行动和观察空间、任务以及环境。

相关内容

黑盒

关注 1

在科学，计算和工程学中，黑盒是一种设备，系统或对象，可以根据其输入和输出（或传输特性）对其进行查看，而无需对其内部工作有任何了解。它的实现是“不透明的”（黑色）。几乎任何事物都可以被称为黑盒：晶体管，引擎，算法，人脑，机构或政府。为了使用典型的“黑匣子方法”来分析建模为开放系统的事物，仅考虑刺激/响应的行为，以推断（未知）盒子。该黑匣子系统的通常表示形式是在该方框中居中的数据流程图。黑盒的对立面是一个内部组件或逻辑可用于检查的系统，通常将其称为白盒（有时也称为“透明盒”或“玻璃盒”）。

【DeepMind】基于模型的强化学习，174页ppt，Model-Based Reinforcement Learning

专知会员服务

89+阅读 · 2021年1月12日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

53+阅读 · 2020年9月7日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

80+阅读 · 2020年7月26日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日