Human-Object Interaction (HOI) detection has received considerable attention in the context of scene understanding. Despite the growing progress on benchmarks, we realize that existing methods often perform unsatisfactorily on distant interactions, where the leading causes are two-fold: 1) Distant interactions are by nature more difficult to recognize than close ones. A natural scene often involves multiple humans and objects with intricate spatial relations, making the interaction recognition for distant human-object largely affected by complex visual context. 2) Insufficient number of distant interactions in benchmark datasets results in under-fitting on these instances. To address these problems, in this paper, we propose a novel two-stage method for better handling distant interactions in HOI detection. One essential component in our method is a novel Far Near Distance Attention module. It enables information propagation between humans and objects, whereby the spatial distance is skillfully taken into consideration. Besides, we devise a novel Distance-Aware loss function which leads the model to focus more on distant yet rare interactions. We conduct extensive experiments on two challenging datasets - HICO-DET and V-COCO. The results demonstrate that the proposed method can surpass existing approaches by a large margin, resulting in new state-of-the-art performance.
翻译:尽管在基准方面不断取得进展,但我们认识到,现有方法往往对遥远的相互作用产生不满意的效果,主要原因有两方面:1) 不同相互作用在自然上比近距离的相互作用更难识别。自然场往往涉及多个人和具有复杂空间关系的物体,这使得对远距离人类物体的相互作用认识受到复杂视觉背景的影响。(2) 基准数据集中的远距离相互作用数量不足,导致在这些情况下的适应不足。为了解决这些问题,我们在本文件中提出了一种新型的两阶段方法,以更好地处理HAI探测中的远距离相互作用。我们的方法中的一个基本组成部分是一个新的远距离注意模块。它使得人类和物体之间的信息传播,从而将空间距离考虑在内。此外,我们设计了一个新的远程软件损失功能,使模型更侧重于遥远但罕见的相互作用。我们在两个具有挑战性的数据集上进行了广泛的实验:HICO-DET和V-CO。结果表明,拟议的方法可以超越现有方法,通过一个巨大的基点,在新的基点中,产生新的基点。