Human-Object Interaction (HOI) detection is a fundamental visual task aiming at localizing and recognizing interactions between humans and objects. Existing works focus on the visual and linguistic features of humans and objects. However, they do not capitalise on the high-level and semantic relationships present in the image, which provides crucial contextual and detailed relational knowledge for HOI inference. We propose a novel method to exploit this information, through the scene graph, for the Human-Object Interaction (SG2HOI) detection task. Our method, SG2HOI, incorporates the SG information in two ways: (1) we embed a scene graph into a global context clue, serving as the scene-specific environmental context; and (2) we build a relation-aware message-passing module to gather relationships from objects' neighborhood and transfer them into interactions. Empirical evaluation shows that our SG2HOI method outperforms the state-of-the-art methods on two benchmark HOI datasets: V-COCO and HICO-DET. Code will be available at https://github.com/ht014/SG2HOI.
翻译:人类-物体相互作用(HOI)探测是一项基本的视觉任务,旨在确定和认识人类与物体之间的相互作用; 现有的工作侧重于人类与物体的视觉和语言特征; 但是,它们并不利用图像中存在的高层次和语义关系,为HOI推断提供至关重要的背景和详细的关系知识; 我们提出一种新颖的方法,通过情景图利用这些信息,用于人类-物体相互作用(SG2HOI)探测任务。 我们的方法,SG2HOI, 以两种方式将SG信息纳入:(1) 我们把一个景象图嵌入全球背景线索,作为特定环境环境环境背景;以及(2) 我们建立一个具有关联意识的信息传递模块,从物体的周围收集关系,并将它们传输到互动中。 经验性评估表明,我们的SG2HOI方法在两个基准HOI数据集:V-COCO和HICO-DET上超越了状态方法。 代码将在https://github.com/h014/SG2HOI上公布。