Human-Object Interaction (HOI) detection aims to learn how human interacts with surrounding objects. Previous HOI detection frameworks simultaneously detect human, objects and their corresponding interactions by using a predictor. Using only one shared predictor cannot differentiate the attentive field of instance-level prediction and relation-level prediction. To solve this problem, we propose a new transformer-based method named Parallel Reasoning Network(PR-Net), which constructs two independent predictors for instance-level localization and relation-level understanding. The former predictor concentrates on instance-level localization by perceiving instances' extremity regions. The latter broadens the scope of relation region to reach a better relation-level semantic understanding. Extensive experiments and analysis on HICO-DET benchmark exhibit that our PR-Net effectively alleviated this problem. Our PR-Net has achieved competitive results on HICO-DET and V-COCO benchmarks.
翻译:人类物体相互作用(HOI)探测旨在了解人类如何与周围物体相互作用。 HOI先前的探测框架通过使用预测器同时探测人类、物体及其相应的相互作用。只使用一个共用的预测器无法区分实例级预测和关系级预测的注意领域。为了解决这个问题,我们提议采用一个新的变压器方法,名为平行推理网络(PR-Net),它建立两个独立的预测器,例如级别定位和关系级理解。前一个预测器通过观察实例的极端区域,集中关注实例级本地化。后者扩大了关系区域的范围,以达到更好的关系级语义理解。关于HICO-DET基准基准的大规模实验和分析表明,我们的PR-Net有效地缓解了这一问题。我们的PR-Net在HiCO-DET和V-CO基准上取得了竞争性的结果。