Robots could learn their own state and world representation from perception and experience without supervision. This desirable goal is the main focus of our field of interest, state representation learning (SRL). Indeed, a compact representation of such a state is beneficial to help robots grasp onto their environment for interacting. The properties of this representation have a strong impact on the adaptive capability of the agent. In this article we present an approach based on imitation learning. The idea is to train several policies that share the same representation to reproduce various demonstrations. To do so, we use a multi-head neural network with a shared state representation feeding a task-specific agent. If the demonstrations are diverse, the trained representation will eventually contain the information necessary for all tasks, while discarding irrelevant information. As such, it will potentially become a compact state representation useful for new tasks. We call this approach SRLfD (State Representation Learning from Demonstration). Our experiments confirm that when a controller takes SRLfD-based representations as input, it can achieve better performance than with other representation strategies and promote more efficient reinforcement learning (RL) than with an end-to-end RL strategy.
翻译:机器人可以在没有监督的情况下从认知和经验中学习自己的状态和世界代表性。 这个理想目标是我们感兴趣的领域,即国家代表性学习(SRL)的主要焦点。 事实上,这种国家的紧密代表性有利于帮助机器人掌握互动环境。 这种代表性的特性对代理人的适应能力有重大影响。 在本条中,我们提出一种基于模拟学习的方法。 其想法是培训一些具有相同代表性的政策来复制各种演示。 为了做到这一点,我们使用一个多头神经网络,并有一个共享的国家代表来喂养一个特定任务代理。 如果演示是多种多样的,经过培训的代表最终将包含所有任务所必需的信息,同时抛弃无关的信息。 因此,它有可能成为一种契约性的国家代表,对新的任务有用。 我们称之为“国家代表从演示中学习”的方法。 我们的实验证实,当一个控制者将SRfD的表述作为投入时,它能够比其他代表战略取得更好的业绩,并促进更有效的强化学习(RL),而不是最终到最后RL战略。