In open-ended continuous environments, robots need to learn multiple parameterised control tasks in hierarchical reinforcement learning. We hypothesise that the most complex tasks can be learned more easily by transferring knowledge from simpler tasks, and faster by adapting the complexity of the actions to the task. We propose a task-oriented representation of complex actions, called procedures, to learn online task relationships and unbounded sequences of action primitives to control the different observables of the environment. Combining both goal-babbling with imitation learning, and active learning with transfer of knowledge based on intrinsic motivation, our algorithm self-organises its learning process. It chooses at any given time a task to focus on; and what, how, when and from whom to transfer knowledge. We show with a simulation and a real industrial robot arm, in cross-task and cross-learner transfer settings, that task composition is key to tackle highly complex tasks. Task decomposition is also efficiently transferred across different embodied learners and by active imitation, where the robot requests just a small amount of demonstrations and the adequate type of information. The robot learns and exploits task dependencies so as to learn tasks of every complexity.
翻译:在开放的连续环境中,机器人需要学习等级强化学习中的多种参数化控制任务。我们假设,最复杂的任务可以更容易地通过从更简单的任务中传授知识来学习,而通过使行动的复杂性适应任务而更快地学习。我们建议以任务为导向的复杂行动、称为程序,以学习在线任务关系和无限制的行动序列原始程序来控制不同的环境观测。将目标与模拟学习结合起来,积极学习与基于内在动机的知识转让相结合,我们的算法自我组织其学习过程。它选择在任何特定时间集中关注一项任务;以及什么、何时和从何人那里转移知识。我们通过模拟和真正的工业机器人臂,在跨任务和跨激光传输环境中显示任务构成是处理高度复杂任务的关键。任务分解配置也通过不同的有代表性的学习者以及积极的仿造而有效转移,机器人只要求少量的演示和适当的信息类型。机器人学习和利用任务依赖性,以便学习各种复杂的任务。