Wearable sensor-based Human Action Recognition (HAR) has achieved remarkable success recently. However, the accuracy performance of wearable sensor-based HAR is still far behind the ones from the visual modalities-based system (i.e., RGB video, skeleton, and depth). Diverse input modalities can provide complementary cues and thus improve the accuracy performance of HAR, but how to take advantage of multi-modal data on wearable sensor-based HAR has rarely been explored. Currently, wearable devices, i.e., smartwatches, can only capture limited kinds of non-visual modality data. This hinders the multi-modal HAR association as it is unable to simultaneously use both visual and non-visual modality data. Another major challenge lies in how to efficiently utilize multimodal data on wearable devices with their limited computation resources. In this work, we propose a novel Progressive Skeleton-to-sensor Knowledge Distillation (PSKD) model which utilizes only time-series data, i.e., accelerometer data, from a smartwatch for solving the wearable sensor-based HAR problem. Specifically, we construct multiple teacher models using data from both teacher (human skeleton sequence) and student (time-series accelerometer data) modalities. In addition, we propose an effective progressive learning scheme to eliminate the performance gap between teacher and student models. We also designed a novel loss function called Adaptive-Confidence Semantic (ACS), to allow the student model to adaptively select either one of the teacher models or the ground-truth label it needs to mimic. To demonstrate the effectiveness of our proposed PSKD method, we conduct extensive experiments on Berkeley-MHAD, UTD-MHAD, and MMAct datasets. The results confirm that the proposed PSKD method has competitive performance compared to the previous mono sensor-based HAR methods.
翻译:以传感器为基础的人类行动识别(HAR)最近取得了显著的成功。然而,基于传感器的磨损式HAR的准确性能仍然远远落后于基于视觉模式的系统(即 RGB 视频、骨架和深度)的精确性能。 多样化输入模式可以提供补充性提示,从而提高HAR的准确性能,但如何利用基于磨损式传感器的多模式数据(HAR)却很少探索。目前,基于磨损式设备(即智能观察)只能捕捉有限的非视觉模式数据。这阻碍了多式HAR的关联,因为它无法同时使用基于视觉和非视觉模式的数据(即 RGB 视频、 骨架、 骨架等) 。在这项工作中,我们提出了一个新的进步式Skeeton- 至感官系统的知识蒸馏模型,它只使用时间序列数据(即智能观察), 并且从基于智能的感官-感官-感官-感官-感官-感官-感应数据数据数据数据,以及我们从基于智能的感官-感官-感官-感官-感官-感官-感官-感官-感官-感官-感官-感官-感官-感官-感官-感官-感官-感官-感官-感官-感官-感官-感官-感官-感官-感官-感官-感官-感官-感官-感官-感官-感官-感官-感官-感官-感官-感官-感官-感官-感官-感官-感官-感官-感官-感官-感官-感官-感官-感官-感官-感官-感官-感官-感官-感官-感官-感官-感官-感官-感官-感官-感官-感官-感官-感官-感官-感官-感官-感官-感官-感官-感官-感官-感官-感官-感官-感官-感官-感官-感性能-感性能-感官-感官-感官-感官-感官-感官-感官-感官-感官-感官-感官-感官-