Multi-task Imitation Learning (MIL) aims to train a policy capable of performing a distribution of tasks, which is essential for general-purpose robots, based on multi-task expert demonstrations. Existing MIL algorithms suffer from low data efficiency and poor performance on complex long-horizontal tasks. We develop Multi-task Hierarchical Adversarial Inverse Reinforcement Learning (MH-AIRL) to learn hierarchically-structured multi-task policies, which is more beneficial for compositional tasks with long horizons and has higher expert data efficiency through identifying and transferring reusable basic skills across tasks. To realize this, MH-AIRL effectively synthesizes context-based multi-task learning, AIRL (an IL approach), and hierarchical policy learning. Further, MH-AIRL can be adopted to demonstrations without the task or skill annotations (i.e., state-action pairs only) which are more accessible in practice. Theoretical justifications are provided for each module of MH-AIRL, and evaluations on challenging multi-task settings demonstrate superior performance and transferability of the multi-task policies learned with MH-AIRL as compared to SOTA MIL baselines.
翻译:多任务模拟学习(MIL)旨在培训一项能够根据多任务专家演示,对通用机器人至关重要的任务分配政策,这是多任务专家演示,现有MIL算法在复杂的长期横向任务方面数据效率低,业绩差,在复杂的长期横向任务方面,现有MIL算法存在数据效率低和业绩差的问题。我们开发了多任务高层次纵向反反反强化学习(MH-AIRL),学习等级结构化多任务政策,这更有利于长期的构成任务,通过查明和转让跨任务的可再使用的基本技能,提高专家数据效率。为了实现这一点,MH-AIRL有效地综合了基于背景的多任务学习、AIRL(IL方针)和等级政策学习。此外,MH-AIRL可以在没有任务或技能说明(即只有州-行动对子)的情况下被采用演示,在实践中更容易获得。为MH-AIR的每个模块提供了理论依据,并对具有挑战性的多任务基本技能设置进行了评价,表明多任务政策的业绩和可转让性,将多任务基线政策作为MA-L的比较的MATA。