Human interaction recognition is very important in many applications. One crucial cue in recognizing an interaction is the interactive body parts. In this work, we propose a novel Interaction Graph Transformer (IGFormer) network for skeleton-based interaction recognition via modeling the interactive body parts as graphs. More specifically, the proposed IGFormer constructs interaction graphs according to the semantic and distance correlations between the interactive body parts, and enhances the representation of each person by aggregating the information of the interactive body parts based on the learned graphs. Furthermore, we propose a Semantic Partition Module to transform each human skeleton sequence into a Body-Part-Time sequence to better capture the spatial and temporal information of the skeleton sequence for learning the graphs. Extensive experiments on three benchmark datasets demonstrate that our model outperforms the state-of-the-art with a significant margin.
翻译:在许多应用中,人类互动的识别非常重要。 认识互动的一个关键提示是互动体部分。 在这项工作中,我们提议建立一个新型互动图变换器(IGFormer)网络,通过将互动体部分建模为图表,进行基于骨骼的互动识别。更具体地说,拟议的IGFormer根据互动体部分之间的语义和距离相关关系构建互动图,并通过根据学习过的图表汇总互动体部分的信息,提高每个人的代表性。此外,我们提议了一个语义分割模块,将每个人类骨骼序列转换成一个身体部分时间序列,以更好地捕捉用于学习图形的骨骼序列的时空信息。关于三个基准数据集的广泛实验表明,我们的模型超越了以显著的边距为主的状态。