Existing vision-based action recognition is susceptible to occlusion and appearance variations, while wearable sensors can alleviate these challenges by capturing human motion with one-dimensional time-series signal. For the same action, the knowledge learned from vision sensors and wearable sensors, may be related and complementary. However, there exists significantly large modality difference between action data captured by wearable-sensor and vision-sensor in data dimension, data distribution and inherent information content. In this paper, we propose a novel framework, named Semantics-aware Adaptive Knowledge Distillation Networks (SAKDN), to enhance action recognition in vision-sensor modality (videos) by adaptively transferring and distilling the knowledge from multiple wearable sensors. The SAKDN uses multiple wearable-sensors as teacher modalities and uses RGB videos as student modality. To preserve local temporal relationship and facilitate employing visual deep learning model, we transform one-dimensional time-series signals of wearable sensors to two-dimensional images by designing a gramian angular field based virtual image generation model. Then, we build a novel Similarity-Preserving Adaptive Multi-modal Fusion Module to adaptively fuse intermediate representation knowledge from different teacher networks. Finally, to fully exploit and transfer the knowledge of multiple well-trained teacher networks to the student network, we propose a novel Graph-guided Semantically Discriminative Mapping loss, which utilizes graph-guided ablation analysis to produce a good visual explanation highlighting the important regions across modalities and concurrently preserving the interrelations of original data. Experimental results on Berkeley-MHAD, UTD-MHAD and MMAct datasets well demonstrate the effectiveness of our proposed SAKDN.
翻译:现有基于视觉的行动认识很容易被隐蔽和外观变异,而磨损传感器可以通过单维时间序列信号捕捉人类运动,减轻这些挑战。对于同一行动,从视觉传感器和可磨损传感器获得的知识可能是相互关联和互补的。然而,在数据层面、数据发布和固有信息内容方面,通过磨损传感器和视觉传感器获取的行动数据之间,存在着巨大的模式差异。在本文件中,我们提议了一个叫作“Semantics-觉悟的适应性知识蒸馏网络”的新框架(SAKDN),通过适应性转让和提取多可磨损传感器的知识,加强视觉传感器模式(视频)的行动认识。对于相同行动而言,从视觉传感器和可磨损传感器和可磨损传感器获得的知识可能是相互联系和互补的。为了维护本地的感光深学习模型,我们通过设计一个基于原格的智能图像生成模型,将提议的可磨损感感传感器的一维时间序列信号转换为二维图像。然后,我们从一个新型的近距离定位-更近-更近的、更近的更近的、多端的多端的機路路路路的網路網路網路網路網路的網路、MIT、MLIM、MLMLMLMLADMDMDMDMDML的傳轉化、再演、再演、再演化、再演化、再演化、再演化、再演化、再演化、再演化、我們的數、再演化、再演導導導導導導導、再演、再演導導導導導導導導導導導導導、我們的數、再演、再演、再演、再演、再演、再演、再演、再演、再演、再演、再導導導導導導導導導導導導導、再演、再演、再演、再演、再演、再演、再演、再導導導導導導導導導導導導導導導導導導導導導導導導導導導導導導導導導、再演、再演、再導導導導導導、再導導導導導導導導導導導導導導導導導導導導導導導導導導導導導