Unlike most previous HOI methods that focus on learning better human-object features, we propose a novel and complementary approach called category query learning. Such queries are explicitly associated to interaction categories, converted to image specific category representation via a transformer decoder, and learnt via an auxiliary image-level classification task. This idea is motivated by an earlier multi-label image classification method, but is for the first time applied for the challenging human-object interaction classification task. Our method is simple, general and effective. It is validated on three representative HOI baselines and achieves new state-of-the-art results on two benchmarks.
翻译:与大多数前期的HOI方法专注于学习更好的人-物特征不同,我们提出了一种称为类别查询学习的新颖而互补的方法。这些查询明确与交互类别相关联,通过变换器解码器转换为图像特定的类别表示,并通过辅助的图像级分类任务进行学习。此想法的动机来自于一个较早的多标签图像分类方法,但首次应用于具有挑战性的人-物交互分类任务。我们的方法简单、通用且有效。它在三个代表性的HOI基线上得到验证,并在两个基准测试中取得了新的最先进结果。