Human-Object Interaction (HOI) detection devotes to learn how humans interact with surrounding objects. Latest end-to-end HOI detectors are short of relation reasoning, which leads to inability to learn HOI-specific interactive semantics for predictions. In this paper, we therefore propose novel relation reasoning for HOI detection. We first present a progressive Relation-aware Frame, which brings a new structure and parameter sharing pattern for interaction inference. Upon the frame, an Interaction Intensifier Module and a Correlation Parsing Module are carefully designed, where: a) interactive semantics from humans can be exploited and passed to objects to intensify interactions, b) interactive correlations among humans, objects and interactions are integrated to promote predictions. Based on modules above, we construct an end-to-end trainable framework named Relation Reasoning Network (abbr. RR-Net). Extensive experiments show that our proposed RR-Net sets a new state-of-the-art on both V-COCO and HICO-DET benchmarks and improves the baseline about 5.5% and 9.8% relatively, validating that this first effort in exploring relation reasoning and integrating interactive semantics has brought obvious improvement for end-to-end HOI detection.
翻译:人类- 目标互动( HOI) 检测致力于了解人类如何与周围天体互动。 最新的端到端 HOI 检测器缺少关系推理, 导致无法学习 HOI 特有互动语义用于预测。 因此, 在本文中, 我们提出新的 HOI 检测关联推理。 我们首先提出一个渐进式关系认知框架, 为互动推理提供一个新的结构和参数共享模式。 在框架上, 一个互动强化模块和关联分析模块经过仔细设计, 其间:(a) 人类的交互式语义可以被利用并传递到对象, 以加强互动, b) 人类、 对象和互动之间的交互关联被整合到促进预测。 基于上述模块, 我们构建了一个名为“ 关系因应网络( abr. RR- Net) 的端到可训练框架。 广泛的实验显示, 我们提议的 RR- 网络在 V- CO 和 HICO- DET 基准上设置了新的状态, 并改进了5.5% 和 9.8% 的基线, 将交互式推理学与 相对而言, 将这一互动推算到最终推介到空间- 的逻辑关系。