Human-Object Interactions (HOI) detection, which aims to localize a human and a relevant object while recognizing their interaction, is crucial for understanding a still image. Recently, transformer-based models have significantly advanced the progress of HOI detection. However, the capability of these models has not been fully explored since the Object Query of the model is always simply initialized as just zeros, which would affect the performance. In this paper, we try to study the issue of promoting transformer-based HOI detectors by initializing the Object Query with category-aware semantic information. To this end, we innovatively propose the Category-Aware Transformer Network (CATN). Specifically, the Object Query would be initialized via category priors represented by an external object detection model to yield better performance. Moreover, such category priors can be further used for enhancing the representation ability of features via the attention mechanism. We have firstly verified our idea via the Oracle experiment by initializing the Object Query with the groundtruth category information. And then extensive experiments have been conducted to show that a HOI detection model equipped with our idea outperforms the baseline by a large margin to achieve a new state-of-the-art result.
翻译:人类- 物体相互作用(HOI) 检测旨在将一个人类和相关对象本地化,同时确认其相互作用,对于理解静止图像至关重要。最近,基于变压器的模型大大推动了HOI检测的进展。然而,这些模型的能力尚未得到充分探索,因为模型的对象查询总是简单地初始化为零,这将影响性能。在本文件中,我们试图研究促进基于变压器的HOI检测的问题,方法是以具有类别识别的语义信息初始化对象查询。为此,我们创新地提议了分类软件变换网络(CATN)。具体地说,对象查询将通过外部物体探测模型的先前类别进行初始化,以产生更好的性能。此外,还可以进一步使用这些类别来通过注意机制提高特征的表达能力。我们首先通过Oracle实验,通过初始化对象查询和有类别识别特征的信息来验证我们的想法。我们随后进行了广泛的实验,以显示一个具备新基线的天体探测模型,以显示一个带有新基线的天体探测模型,从而实现一个带有新基线的天体空间模型。