Graph-based reasoning over skeleton data has emerged as a promising approach for human action recognition. However, the application of prior graph-based methods, which predominantly employ whole temporal sequences as their input, to the setting of online inference entails considerable computational redundancy. In this paper, we tackle this issue by reformulating the Spatio-Temporal Graph Convolutional Neural Network as a Continual Inference Network, which can perform step-by-step predictions in time without repeat frame processing. To evaluate our method, we create a continual version of ST-GCN, CoST-GCN, alongside two derived methods with different self-attention mechanisms, CoAGCN and CoS-TR. We investigate weight transfer strategies and architectural modifications for inference acceleration, and perform experiments on the NTU RGB+D 60, NTU RGB+D 120, and Kinetics Skeleton 400 datasets. Retaining similar predictive accuracy, we observe up to 109x reduction in time complexity, on-hardware accelerations of 26x, and reductions in maximum allocated memory of 52% during online inference.
翻译:基于骨架数据的基于图形式的推理已经成为人类动作识别的一种有前途的方法。但是,在线推理的先前基于图形式的方法,它们主要将整个时间序列作为其输入,在计算上存在重复的冗余。在本文中,我们通过将空时图卷积神经网络重新构造为持续推理网络来解决这个问题,它可以按时间逐步进行预测,而无需重复帧处理。为了评估我们的方法,我们创建了ST-GCN的连续版本CoST-GCN,以及两种带有不同自注意机制的衍生方法CoAGCN和CoS-TR。我们研究了推理加速的重量转移策略和架构修改,并在NTU RGB+D 60、NTU RGB+D 120和Kinetics Skeleton 400数据集上进行实验。保持类似的预测准确性,我们观察到时间复杂度减少了109倍,硬件加速了26倍,在线推理时最大分配内存减少了52%。