Detecting human-object interactions is essential for comprehensive understanding of visual scenes. In particular, spatial connections between humans and objects are important cues for reasoning interactions. To this end, we propose a skeleton-aware graph convolutional network for human-object interaction detection, named SGCN4HOI. Our network exploits the spatial connections between human keypoints and object keypoints to capture their fine-grained structural interactions via graph convolutions. It fuses such geometric features with visual features and spatial configuration features obtained from human-object pairs. Furthermore, to better preserve the object structural information and facilitate human-object interaction detection, we propose a novel skeleton-based object keypoints representation. The performance of SGCN4HOI is evaluated in the public benchmark V-COCO dataset. Experimental results show that the proposed approach outperforms the state-of-the-art pose-based models and achieves competitive performance against other models.
翻译:检测人类物体的相互作用对于全面了解视觉场景至关重要,特别是,人类与物体之间的空间联系是推理互动的重要提示。为此,我们提议建立一个名为SGCN4HOI的骨架觉醒图像变异网络,用于人体物体相互作用探测。我们的网络利用人类关键点和物体关键点之间的空间联系,通过图象组合捕捉其细微结构相互作用。它将这些几何特征与视觉特征和从人体对象对子获得的空间配置特征结合起来。此外,为了更好地保存物体结构信息,便利人体物体相互作用探测,我们提议了一个基于骨架的新型物体关键点。SGCN4HOI的性能在公共基准V-CO数据集中进行了评估。实验结果表明,拟议方法比基于外形模型的状态要强,并与其他模型相比取得了竞争性的性能。