We present a method for learning a human-robot collaboration policy from human-human collaboration demonstrations. An effective robot assistant must learn to handle diverse human behaviors shown in the demonstrations and be robust when the humans adjust their strategies during online task execution. Our method co-optimizes a human policy and a robot policy in an interactive learning process: the human policy learns to generate diverse and plausible collaborative behaviors from demonstrations while the robot policy learns to assist by estimating the unobserved latent strategy of its human collaborator. Across a 2D strategy game, a human-robot handover task, and a multi-step collaborative manipulation task, our method outperforms the alternatives in both simulated evaluations and when executing the tasks with a real human operator in-the-loop. Supplementary materials and videos at https://sites.google.com/view/co-gail-web/home
翻译:我们提出了一个从人类-人类合作示范中学习人类-机器人合作政策的方法。一个有效的机器人助理必须学会处理演示中显示的各种人类行为,当人类在在线任务执行期间调整其战略时,必须保持稳健。我们的方法在互动学习过程中共同优化了人类政策和机器人政策:人类政策学会从演示中产生多样化和可信的合作行为,而机器人政策则学会通过估计其人类协作者未见的潜在战略来帮助。在2D战略游戏中,人类-机器人交接任务和多步合作操作任务中,我们的方法在模拟评估中和在与现场真正的人类操作者执行任务时,都超越了替代办法。补充材料和视频见https://sites.gogle.com/view/co-gail-web/home。