Understanding human intentions during interactions has been a long-lasting theme, that has applications in human-robot interaction, virtual reality and surveillance. In this study, we focus on full-body human interactions with large-sized daily objects and aim to predict the future states of objects and humans given a sequential observation of human-object interaction. As there is no such dataset dedicated to full-body human interactions with large-sized daily objects, we collected a large-scale dataset containing thousands of interactions for training and evaluation purposes. We also observe that an object's intrinsic physical properties are useful for the object motion prediction, and thus design a set of object dynamic descriptors to encode such intrinsic properties. We treat the object dynamic descriptors as a new modality and propose a graph neural network, HO-GCN, to fuse motion data and dynamic descriptors for the prediction task. We show the proposed network that consumes dynamic descriptors can achieve state-of-the-art prediction results and help the network better generalize to unseen objects. We also demonstrate the predicted results are useful for human-robot collaborations.
翻译:在互动中,人类的意图是一个长期的主题,它具有在人-机器人相互作用、虚拟现实和监视方面的应用。在这项研究中,我们侧重于与大型日常物体的全面人体相互作用,并着眼于预测物体和人类的未来状态,通过对人体-物体相互作用进行连续观测。由于没有这种数据集专门用于与大型日常物体的全面人体相互作用,我们收集了一个大型数据集,其中包含了数千个相互作用,用于培训和评估目的。我们还注意到,一个物体的内在物理特性对物体运动预测有用,因此设计了一套物体动态描述器来编码这些内在特性。我们把物体动态描述器作为一种新模式对待,并提议一个图形神经网络,即HO-GCN,将运动数据和动态描述器结合到预测任务中。我们展示了使用动态描述器的拟议网络能够取得最新预测结果并帮助网络更好地概括到看不见的物体。我们还展示了预测结果对人体-机器人合作有用。