从示范中学习 (State Representation Learning from Demonstration)

Robots could learn their own state and world representation from perception and experience without supervision. This desirable goal is the main focus of our field of interest, state representation learning (SRL). Indeed, a compact representation of such a state is beneficial to help robots grasp onto their environment for interacting. The properties of this representation have a strong impact on the adaptive capability of the agent. In this article we present an approach based on imitation learning. The idea is to train several policies that share the same representation to reproduce various demonstrations. To do so, we use a multi-head neural network with a shared state representation feeding a task-specific agent. If the demonstrations are diverse, the trained representation will eventually contain the information necessary for all tasks, while discarding irrelevant information. As such, it will potentially become a compact state representation useful for new tasks. We call this approach SRLfD (State Representation Learning from Demonstration). Our experiments confirm that when a controller takes SRLfD-based representations as input, it can achieve better performance than with other representation strategies and promote more efficient reinforcement learning (RL) than with an end-to-end RL strategy.

翻译：机器人可以在没有监督的情况下从认知和经验中学习自己的状态和世界代表性。这个理想目标是我们感兴趣的领域,即国家代表性学习(SRL)的主要焦点。事实上,这种国家的紧密代表性有利于帮助机器人掌握互动环境。这种代表性的特性对代理人的适应能力有重大影响。在本条中,我们提出一种基于模拟学习的方法。其想法是培训一些具有相同代表性的政策来复制各种演示。为了做到这一点,我们使用一个多头神经网络,并有一个共享的国家代表来喂养一个特定任务代理。如果演示是多种多样的,经过培训的代表最终将包含所有任务所必需的信息,同时抛弃无关的信息。因此,它有可能成为一种契约性的国家代表,对新的任务有用。我们称之为“国家代表从演示中学习”的方法。我们的实验证实,当一个控制者将SRfD的表述作为投入时,它能够比其他代表战略取得更好的业绩,并促进更有效的强化学习(RL),而不是最终到最后RL战略。

相关内容

表示学习

关注 186

表示学习是通过利用训练数据来学习得到向量表示，这可以克服人工方法的局限性。表示学习通常可分为两大类，无监督和有监督表示学习。大多数无监督表示学习方法利用自动编码器（如去噪自动编码器和稀疏自动编码器等）中的隐变量作为表示。目前出现的变分自动编码器能够更好的容忍噪声和异常值。然而，推断给定数据的潜在结构几乎是不可能的。目前有一些近似推断的策略。此外，一些无监督表示学习方法旨在近似某种特定的相似性度量。提出了一种无监督的相似性保持表示学习框架，该框架使用矩阵分解来保持成对的DTW相似性。通过学习保持DTW的shaplets，即在转换后的空间中的欧式距离近似原始数据的真实DTW距离。有监督表示学习方法可以利用数据的标签信息，更好地捕获数据的语义结构。孪生网络和三元组网络是目前两种比较流行的模型，它们的目标是最大化类别之间的距离并最小化了类别内部的距离。

【DeepMind】基于模型的强化学习，174页ppt，Model-Based Reinforcement Learning

专知会员服务

89+阅读 · 2021年1月12日

最新《自监督表示学习》报告，70页ppt

专知会员服务

86+阅读 · 2020年12月22日

【DeepMind】强化学习教程，83页ppt

专知会员服务

158+阅读 · 2020年8月7日

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

专知会员服务

41+阅读 · 2020年4月11日