Recent state-of-the-art methods for HOI detection typically build on transformer architectures with two decoder branches, one for human-object pair detection and the other for interaction classification. Such disentangled transformers, however, may suffer from insufficient context exchange between the branches and lead to a lack of context information for relational reasoning, which is critical in discovering HOI instances. In this work, we propose the multiplex relation network (MUREN) that performs rich context exchange between three decoder branches using unary, pairwise, and ternary relations of human, object, and interaction tokens. The proposed method learns comprehensive relational contexts for discovering HOI instances, achieving state-of-the-art performance on two standard benchmarks for HOI detection, HICO-DET and V-COCO.
翻译:最近,HOI检测的最新技术通常基于具有两个解码器分支的变压器架构,一个用于检测人-物对,另一个用于交互分类。然而,这种分离的变压器可能会由于分支之间的上下文交换不足而导致缺乏关系推理的上下文信息,这对于发现HOI实例至关重要。在这项工作中,我们提出了多路关系网络(MUREN),它使用人,物和交互令牌的单元,成对和三元关系在三个解码器分支之间进行丰富的上下文交换。所提出的方法学习综合关系上下文以发现HOI实例,在HOI检测的两个标准基准测试中,HICO-DET和V-COCO实现了最先进的性能。