Modeling multimodal human behavior has been a key barrier to increasing the level of interaction between human and robot, particularly for collaborative tasks. Our key insight is that an effective, learned robot policy used for human-robot collaborative tasks must be able to express a high degree of multimodality, predict actions in a temporally consistent manner, and recognize a wide range of frequencies of human actions in order to seamlessly integrate with a human in the control loop. We present Diffusion Co-policy, a method for planning sequences of actions that synergize well with humans during test time. The co-policy predicts joint human-robot action sequences via a Transformer-based diffusion model, which is trained on a dataset of collaborative human-human demonstrations, and directly executes the robot actions in a receding horizon control framework. We demonstrate in both simulation and real environments that the method outperforms other state-of-art learning methods on the task of human-robot table-carrying with a human in the loop. Moreover, we qualitatively highlight compelling robot behaviors that demonstrate evidence of true human-robot collaboration, including mutual adaptation, shared task understanding, leadership switching, and low levels of wasteful interaction forces arising from dissent.
翻译:暂无翻译