Despite great progress achieved by transformer in various vision tasks, it is still underexplored for skeleton-based action recognition with only a few attempts. Besides, these methods directly calculate the pair-wise global self-attention equally for all the joints in both the spatial and temporal dimensions, undervaluing the effect of discriminative local joints and the short-range temporal dynamics. In this work, we propose a novel Focal and Global Spatial-Temporal Transformer network (FG-STFormer), that is equipped with two key components: (1) FG-SFormer: focal joints and global parts coupling spatial transformer. It forces the network to focus on modelling correlations for both the learned discriminative spatial joints and human body parts respectively. The selective focal joints eliminate the negative effect of non-informative ones during accumulating the correlations. Meanwhile, the interactions between the focal joints and body parts are incorporated to enhance the spatial dependencies via mutual cross-attention. (2) FG-TFormer: focal and global temporal transformer. Dilated temporal convolution is integrated into the global self-attention mechanism to explicitly capture the local temporal motion patterns of joints or body parts, which is found to be vital important to make temporal transformer work. Extensive experimental results on three benchmarks, namely NTU-60, NTU-120 and NW-UCLA, show our FG-STFormer surpasses all existing transformer-based methods, and compares favourably with state-of-the art GCN-based methods.
翻译:尽管变压器在各种愿景任务中取得了巨大进展,但人们仍未能探索它是否具备基于骨架的行动识别,只进行了几次尝试。此外,这些方法直接计算出在空间和时间两个层面所有联合点的双向全球自我关注,低估了歧视性地方联合和短距离时间动态的影响。在这项工作中,我们提议建立一个新型的协调中心和全球空间-时空变压器网络(FG-STFormer),该网络配备了两个关键组成部分:(1) FG-SFormer:焦点联合和全球部分,空间变压器。这些方法迫使网络分别侧重于为学到的歧视性空间联合和人体部分的模拟相关性。有选择的协调中心消除了非强化性地方联合和短距离时间动态的消极影响。与此同时,我们提议将协调中心与全球空间-时空变转换器网络(FG-SFFormerer:焦点和全球时间变压器,全球时间变压器)纳入全球自我定位机制,将NCR-N-N-N-N-N-N-N-N-N-N-N-N-N-N-N-N-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-直流流变压制成像、制成像的三种重要实验、制式、制式、制成像、制式、制式、制式、制式、制式、制式、制式、制式、制式、制式、制式、制式、制式、制式、制式、制式、制式、制式、制式、制式、制式、制式、制式、制式、制式、制式、制式、制式、制式、制式、制式、制式、制式、制式、制式、制式、制式、制式、制式、制式、制式、制式、制式、制式制式制式制式、制式制式制式、制式、制式、制式、制式、制式、制式、制式、制式、制式、制式、制式、制式、制式、制式、制式、制式、制式、制式、制式、制式-