Object handover is a common human collaboration behavior that attracts attention from researchers in Robotics and Cognitive Science. Though visual perception plays an important role in the object handover task, the whole handover process has been specifically explored. In this work, we propose a novel rich-annotated dataset, H2O, for visual analysis of human-human object handovers. The H2O, which contains 18K video clips involving 15 people who hand over 30 objects to each other, is a multi-purpose benchmark. It can support several vision-based tasks, from which, we specifically provide a baseline method, RGPNet, for a less-explored task named Receiver Grasp Prediction. Extensive experiments show that the RGPNet can produce plausible grasps based on the giver's hand-object states in the pre-handover phase. Besides, we also report the hand and object pose errors with existing baselines and show that the dataset can serve as the video demonstrations for robot imitation learning on the handover task. Dataset, model and code will be made public.
翻译:物体移转是人类合作的一种常见行为,吸引了机器人学和认知科学研究人员的注意。 虽然视觉认知在物体移转任务中起着重要作用, 但整个移转过程已经进行了专门探讨。 在这项工作中,我们提出了一个新的富含注释的数据集H2O, 用于人类物体移转的视觉分析。 H2O包含18K视频片段, 涉及15个向对方移交30多个物体的15人, 是一个多功能基准。 它可以支持若干基于愿景的任务, 其中我们专门为较不易探测的任务提供了基准方法 RGPNet, 名为 Ractor Grap 预测。 广泛的实验显示, RGPNet能够在交接前的阶段根据授标的手截图生成可信的抓图。 此外, 我们还用现有基线报告手和物体存在错误, 并显示数据集可以作为在移转任务中进行机器人仿真学习的视频演示。 数据集、 模型和代码将被公诸于众。