Meta-reinforcement learning algorithms can enable autonomous agents, such as robots, to quickly acquire new behaviors by leveraging prior experience in a set of related training tasks. However, the onerous data requirements of meta-training compounded with the challenge of learning from sensory inputs such as images have made meta-RL challenging to apply to real robotic systems. Latent state models, which learn compact state representations from a sequence of observations, can accelerate representation learning from visual inputs. In this paper, we leverage the perspective of meta-learning as task inference to show that latent state models can \emph{also} perform meta-learning given an appropriately defined observation space. Building on this insight, we develop meta-RL with latent dynamics (MELD), an algorithm for meta-RL from images that performs inference in a latent state model to quickly acquire new skills given observations and rewards. MELD outperforms prior meta-RL methods on several simulated image-based robotic control problems, and enables a real WidowX robotic arm to insert an Ethernet cable into new locations given a sparse task completion signal after only $8$ hours of real world meta-training. To our knowledge, MELD is the first meta-RL algorithm trained in a real-world robotic control setting from images.
翻译:元加强学习算法可以让自动代理(如机器人)通过在一系列相关培训任务中利用先前的经验迅速获得新的行为。然而,元培训的繁琐数据要求加上从图像等感官投入中学习的挑战,使得元调整法难以应用于真正的机器人系统。从一系列观测中学习压缩状态表示法的死后状态模型可以加速从视觉输入中学习代表法。在本文中,我们利用元学习的视角作为任务推理,表明潜伏状态模型能够根据适当确定的观测空间进行元学习。根据这一观察,我们开发具有潜伏动态的元调整法(MELD),这是在潜伏状态模型中进行推断以迅速获得新技能的元调整法算法的算法,给观察和奖励。MELD在几个模拟基于图像的机器人控制问题上超越了以前的元调整法方法,并使得真正的LoubX机器人臂能够将Ethernet电缆插入到新地点,在仅经过8美元时间的原始任务完成信号后,只有零位任务完成后,我们开发了潜伏的MYL,这是我们从真实的Metal-alalal-tragyal imalma-tragyal magal建立了我们的知识。