Deep imitation learning requires many expert demonstrations, which can be hard to obtain, especially when many tasks are involved. However, different tasks often share similarities, so learning them jointly can greatly benefit them and alleviate the need for many demonstrations. But, joint multi-task learning often suffers from negative transfer, sharing information that should be task-specific. In this work, we introduce a method to perform multi-task imitation while allowing for task-specific features. This is done by using proto-policies as modules to divide the tasks into simple sub-behaviours that can be shared. The proto-policies operate in parallel and are adaptively chosen by a selector mechanism that is jointly trained with the modules. Experiments on different sets of tasks show that our method improves upon the accuracy of single agents, task-conditioned and multi-headed multi-task agents, as well as state-of-the-art meta learning agents. We also demonstrate its ability to autonomously divide the tasks into both shared and task-specific sub-behaviours.
翻译:深度模仿学习需要许多专家示范,这很难获得,特别是在涉及许多任务的情况下。然而,不同的任务往往有相似之处,因此共同学习可以大大有利于他们,减轻对许多演示的需求。但是,共同的多任务学习往往受到负面转移的影响,共享应当针对具体任务的信息。在这项工作中,我们引入了一种方法来进行多任务模仿,同时允许有特定任务的特点。这样做的方法是利用原始政策作为模块,将任务分为可以共享的简单次级行为。本政策平行运作,并且由与模块共同培训的选定机制来适应选择。对不同任务组的实验表明,我们的方法改进了单一物剂的准确性、有任务条件的多任务型多任务性,以及最先进的元学习剂。我们还表明,我们有能力自主地将任务分为共同和特定任务的次级行为。