Skeleton sequences are lightweight and compact, and thus are ideal candidates for action recognition on edge devices. Recent skeleton-based action recognition methods extract features from 3D joint coordinates as spatial-temporal cues, using these representations in a graph neural network for feature fusion to boost recognition performance. The use of first- and second-order features, i.e., joint and bone representations, has led to high accuracy. Nonetheless, many models are still confused by actions that have similar motion trajectories. To address these issues, we propose fusing higher-order features in the form of angular encoding into modern architectures to robustly capture the relationships between joints and body parts. This simple fusion with popular spatial-temporal graph neural networks achieves new state-of-the-art accuracy in two large benchmarks, including NTU60 and NTU120, while employing fewer parameters and reduced run time. Our source code is publicly available at: https://github.com/ZhenyueQin/Angular-Skeleton-Encoding.
翻译:皮肤序列是轻量和紧凑的,因此是边缘装置行动识别的理想选择。最近基于骨骼的行动识别方法提取了3D联合坐标的特征,作为空间-时空提示,在图形神经网络中使用这些特征,进行特征聚合,以提高识别性能。使用一等和二等特征,即联合和骨骼表示法,导致高度精确。然而,许多模型仍然被具有类似运动轨迹的行动所混淆。为了解决这些问题,我们提议在现代建筑中以角编码的形式设置高阶特征,以牢固地捕捉联合和身体部分之间的关系。这种简单的与流行的空间-时空图形神经网络结合的简单组合在两个大基准中达到新的最新精确度,包括NTU60和NTU120,同时使用较少的参数和较短的运行时间。我们的源代码在https://github.com/ZhenyueQin/Adern-Skeleton-Encoding上公开提供。