Humans intuitively solve tasks in versatile ways, varying their behavior in terms of trajectory-based planning and for individual steps. Thus, they can easily generalize and adapt to new and changing environments. Current Imitation Learning algorithms often only consider unimodal expert demonstrations and act in a state-action-based setting, making it difficult for them to imitate human behavior in case of versatile demonstrations. Instead, we combine a mixture of movement primitives with a distribution matching objective to learn versatile behaviors that match the expert's behavior and versatility. To facilitate generalization to novel task configurations, we do not directly match the agent's and expert's trajectory distributions but rather work with concise geometric descriptors which generalize well to unseen task configurations. We empirically validate our method on various robot tasks using versatile human demonstrations and compare to imitation learning algorithms in a state-action setting as well as a trajectory-based setting. We find that the geometric descriptors greatly help in generalizing to new task configurations and that combining them with our distribution-matching objective is crucial for representing and reproducing versatile behavior.
翻译:人类直觉地以多种方式解决任务,在轨迹规划和个别步骤方面,他们的行为各不相同。因此,他们可以很容易地概括和适应新的和变化的环境。当前的模拟学习算法通常只考虑单式专家演示,并在基于状态的行动环境中行动,使他们难以在多功能的演示中模仿人类行为。相反,我们把运动原始体与分布匹配目标结合起来,以学习与专家行为和多功能相匹配的多功能行为。为了便利与新式任务配置的概括化,我们不直接匹配该代理人和专家的轨迹分布,而是与简洁的几何描述符进行工作,这些描述使无法对未知的任务配置进行概括化。我们用实证方式验证了我们使用多功能人类演示的各种机器人任务的方法,并比较了在状态行动环境中的模拟学习算法以及基于轨迹的设置。我们发现,几何测量描述符非常有助于概括新的任务配置,并且将它们与我们的分布匹配目标结合起来对于代表和复制灵活行为至关重要。