Human-Object Interaction (HOI) detection, inferring the relationships between human and objects from images/videos, is a fundamental task for high-level scene understanding. However, HOI detection usually suffers from the open long-tailed nature of interactions with objects, while human has extremely powerful compositional perception ability to cognize rare or unseen HOI samples. Inspired by this, we devise a novel HOI compositional learning framework, termed as Fabricated Compositional Learning (FCL), to address the problem of open long-tailed HOI detection. Specifically, we introduce an object fabricator to generate effective object representations, and then combine verbs and fabricated objects to compose new HOI samples. With the proposed object fabricator, we are able to generate large-scale HOI samples for rare and unseen categories to alleviate the open long-tailed issues in HOI detection. Extensive experiments on the most popular HOI detection dataset, HICO-DET, demonstrate the effectiveness of the proposed method for imbalanced HOI detection and significantly improve the state-of-the-art performance on rare and unseen HOI categories. Code is available at https://github.com/zhihou7/FCL.
翻译:人体-人体器官相互作用(HOI)的检测,即人类与图像/视频中物体之间的关系,是高层次了解现场的基本任务,然而,HOI的检测通常受到与物体互动的开放长尾性质的影响,而人类具有极强的构成感知能力,可以辨别稀有或看不见的HOI样本。受此启发,我们设计了一个新型HOI成像学习框架,称为“结构化合成学习”(FCL),以解决公开长尾HOI探测的问题。具体地说,我们引入了一个物体制造器,以产生有效的物体表示,然后将动词和编造的物体组合成新的HOI样品。与拟议的天体构造器一起,我们可以生成大型HOI的样本,用于稀有和看不见的类别,以缓解HOI探测中公开的长尾问题。关于最受欢迎的HOI检测数据集(HICO-DET)的广泛实验,表明拟议的HOI检测方法的有效性,并显著改进稀有和未见的HOI/HAHI/FC的状态。