Humans often think of complex tasks as combinations of simpler subtasks in order to learn those complex tasks more efficiently. For example, a backflip could be considered a combination of four subskills: jumping, tucking knees, rolling backwards, and thrusting arms downwards. Motivated by this line of reasoning, we propose a new algorithm that trains neural network policies on simple, easy-to-learn skills in order to cultivate latent spaces that accelerate imitation learning of complex, hard-to-learn skills. We focus on the case in which the complex task comprises a concurrent (and possibly sequential) combination of the simpler subtasks, and therefore our algorithm can be seen as a novel approach to concurrent hierarchical imitation learning. We evaluate our algorithm on difficult tasks in a high-dimensional environment and find that it consistently outperforms a state-of-the-art baseline in training speed and overall performance.
翻译:人类常常把复杂的任务视为更简单的子任务组合,以便更有效地学习这些复杂的任务。 比如,回翻可以被视为四个子技能的组合:跳跃、弯膝、向后滚和向下推臂。 受这种推理的驱动,我们提出一种新的算法,在简单、易读的技能上培训神经网络政策,以培养潜伏空间,加速学习复杂、难读技能的模仿。 我们集中关注复杂任务包括简单子任务同时(可能按顺序)组合的情况,因此,我们的算法可以被视为并行的等级模仿学习的一种新颖方法。 我们评估了高维度环境中的困难任务的算法,发现它在培训速度和总体业绩方面始终超越了最先进的基线。