Human-Object Interaction (HOI) detection plays a crucial role in activity understanding. Though significant progress has been made, interactiveness learning remains a challenging problem in HOI detection: existing methods usually generate redundant negative H-O pair proposals and fail to effectively extract interactive pairs. Though interactiveness has been studied in both whole body- and part- level and facilitates the H-O pairing, previous works only focus on the target person once (i.e., in a local perspective) and overlook the information of the other persons. In this paper, we argue that comparing body-parts of multi-person simultaneously can afford us more useful and supplementary interactiveness cues. That said, to learn body-part interactiveness from a global perspective: when classifying a target person's body-part interactiveness, visual cues are explored not only from herself/himself but also from other persons in the image. We construct body-part saliency maps based on self-attention to mine cross-person informative cues and learn the holistic relationships between all the body-parts. We evaluate the proposed method on widely-used benchmarks HICO-DET and V-COCO. With our new perspective, the holistic global-local body-part interactiveness learning achieves significant improvements over state-of-the-art. Our code is available at https://github.com/enlighten0707/Body-Part-Map-for-Interactiveness.
翻译:尽管取得了显著进展,互动学习仍然是HOI检测中的一个棘手问题:现有方法通常产生多余的负面H-O对配方建议,无法有效地提取互动配对。虽然整个身体和部分层次都研究了互动性,促进了H-O配对,以前的工作只关注目标对象一次(即从当地角度),忽略了其他人的信息。在本文中,我们认为,比较多人身体部分同时可以使我们更有用和补充互动性提示。这是为了从全球角度学习身体部分的互动性:在对目标个人身体部分的互动性进行分类时,不仅从本人/本人的角度,而且从图像中的其他人的角度来探索视觉提示。我们根据自我保护地雷跨人信息提示,绘制了身体部分突出的地图,并了解了所有身体部分之间的整体关系。我们评估了广泛使用的HICO-DET/DOCO基准的拟议方法,这是从全球角度来进行互动式学习的。我们在全球范围进行互动式的代码/内部/组织/组织学习。