In recent years, skeleton-based action recognition, leveraging multimodal Graph Convolutional Networks (GCN), has achieved remarkable results. However, due to their deep structure and reliance on continuous floating-point operations, GCN-based methods are energy-intensive. We propose an innovative Spiking Graph Convolutional Network with Multimodal Fusion and Knowledge Distillation (MK-SGN) to address this issue. By merging the energy efficiency of Spiking Neural Network (SNN) with the graph representation capability of GCN, the proposed MK-SGN reduces energy consumption while maintaining recognition accuracy. Firstly, we convert Graph Convolutional Networks (GCN) into Spiking Graph Convolutional Networks (SGN) establishing a new benchmark and paving the way for future research exploration. During this process, we introduce a spiking attention mechanism and design a Spiking-Spatio Graph Convolution module with a Spatial Global Spiking Attention mechanism (SA-SGC), enhancing feature learning capability. Secondly, we propose a Spiking Multimodal Fusion module (SMF), leveraging mutual information to process multimodal data more efficiently. Lastly, we delve into knowledge distillation methods from multimodal GCN to SGN and propose a novel, integrated method that simultaneously focuses on both intermediate layer distillation and soft label distillation to improve the performance of SGN. MK-SGN outperforms the state-of-the-art GCN-like frameworks on three challenging datasets for skeleton-based action recognition in reducing energy consumption. It also outperforms the state-of-the-art SNN frameworks in accuracy. Specifically, our method reduces energy consumption by more than 98% compared to typical GCN-based methods, while maintaining competitive accuracy on the NTU-RGB+D 60 cross-subject split using 4-time steps.
翻译:暂无翻译