Recently, Transformer-based networks have shown great promise on skeleton-based action recognition tasks. The ability to capture global and local dependencies is the key to success while it also brings quadratic computation and memory cost. Another problem is that previous studies mainly focus on the relationships among individual joints, which often suffers from the noisy skeleton joints introduced by the noisy inputs of sensors or inaccurate estimations. To address the above issues, we propose a novel Transformer-based network (IIP-Transformer). Instead of exploiting interactions among individual joints, our IIP-Transformer incorporates body joints and parts interactions simultaneously and thus can capture both joint-level (intra-part) and part-level (inter-part) dependencies efficiently and effectively. From the data aspect, we introduce a part-level skeleton data encoding that significantly reduces the computational complexity and is more robust to joint-level skeleton noise. Besides, a new part-level data augmentation is proposed to improve the performance of the model. On two large-scale datasets, NTU-RGB+D 60 and NTU RGB+D 120, the proposed IIP-Transformer achieves the-state-of-art performance with more than 8x less computational complexity than DSTA-Net, which is the SOTA Transformer-based method.
翻译:最近,以变异器为基础的网络在基于骨架的行动识别任务上表现出巨大的希望。捕捉全球和地方依赖性的能力是成功的关键,同时也带来二次计算和记忆成本。另一个问题是,以前的研究主要侧重于单个联合体之间的关系,这往往受到传感器的噪音输入或不准确估计带来的噪音骨架联合的影响。为了解决上述问题,我们提议建立一个新的基于变异器的网络(IIP-Transexter ), 而不是利用个人联合体之间的相互作用,我们的IP-Transefer整合机体的连接和部分互动,从而能够有效和有效地捕捉联合(部内)和部分(部间)依赖性。从数据方面看,我们引入了部分级骨架数据编码,大大降低计算的复杂性,并更有力地应对联合骨架噪音。此外,我们提议采用新的半级数据扩充来改进模型的性能。在两个大型数据集上,即NTU-RGB+D60和NTU RGB+120, 从而能够有效和有效地捕捉到联合(部间)和部分(部间)依赖)。从部分一级(部间)依赖。从数据方面,我们提出的IIP-Trade-Tradestreft-Trade-trax-tradeal的计算法比更低的8-SIT-II-Syal-II-II-II-II-II-II-II-II-S-S-S-II-S-S-II-II-II-II-S-II-II-II-II-II-II-II-S-II-II-更低的计算方法实现更低的复杂性性能性能-制-制-制-制-制-制-制-制-制-制-制-制-制-制方法。