Meta-learning algorithms can accelerate the model-based reinforcement learning (MBRL) algorithms by finding an initial set of parameters for the dynamical model such that the model can be trained to match the actual dynamics of the system with only a few data-points. However, in the real world, a robot might encounter any situation starting from motor failures to finding itself in a rocky terrain where the dynamics of the robot can be significantly different from one another. In this paper, first, we show that when meta-training situations (the prior situations) have such diverse dynamics, using a single set of meta-trained parameters as a starting point still requires a large number of observations from the real system to learn a useful model of the dynamics. Second, we propose an algorithm called FAMLE that mitigates this limitation by meta-training several initial starting points (i.e., initial parameters) for training the model and allows the robot to select the most suitable starting point to adapt the model to the current situation with only a few gradient steps. We compare FAMLE to MBRL, MBRL with a meta-trained model with MAML, and model-free policy search algorithm PPO for various simulated and real robotic tasks, and show that FAMLE allows the robots to adapt to novel damages in significantly fewer time-steps than the baselines.
翻译:元学习算法可以加快基于模型的强化学习算法(MBRL),方法是为动态模型找到一套初步参数,使模型能够被训练成只用几个数据点来匹配系统的实际动态。然而,在现实世界中,机器人可能遇到任何从机体失灵到在岩石地形中发现自己的情况,在岩石地形中机器人的动态可能大不相同。在本文中,首先,我们显示当元培训情况(以前的情况)具有这种不同的动态时,使用一套单一的经超常训练的参数作为起点,仍然需要从实际系统中进行大量观察,以学习一个有用的动态模型。第二,我们建议采用一个称为FAMLE的算法,通过培训数个初始起点(即初步参数)来减轻这种限制,使机器人选择最合适的起点,使模型适应当前情况,只有几个梯度步骤。我们将FAMLE和MBRL、MBRL与一个经过元训练的模型模型进行大量观测,以便学习一个有用的动态模型模型模型模型。第二,我们提议一种称为FAMLE的算法,通过培训减轻政策搜索模型,使各种机器人的模型的模型能够模拟模型,使各种模型的模型进行模拟的模型的模型的模型的模型的模型,使模型的模型的模型的模型的模型的模型的模型的模型可以使模型的模型的模型的模型的模型的模型能够使模型的模型的模型的模型的模型的模型的模型的模型的模型的模型能够使模型的模型的模型能够使模型的模型的模型的模型的模型的模型能够使模型能够使模型的模型的模型能够使模型能够使模型的模型的模型的模型能够使模型的模型的模型的模型的模型的模型的模型的模型的模型的模型能够使模型的模型的模型的模型的模型的模型的模型的模型的模型能够使模型的模型的模型能够使模型能够使模型能够使模型的模型的模型的模型的模型的模型的模型能够使模型的模型的模型的模型能够使模型的模型能够使模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型的模型