Human activity recognition (HAR) based on multi-modal approach has been recently shown to improve the accuracy performance of HAR. However, restricted computational resources associated with wearable devices, i.e., smartwatch, failed to directly support such advanced methods. To tackle this issue, this study introduces an end-to-end Vision-to-Sensor Knowledge Distillation (VSKD) framework. In this VSKD framework, only time-series data, i.e., accelerometer data, is needed from wearable devices during the testing phase. Therefore, this framework will not only reduce the computational demands on edge devices, but also produce a learning model that closely matches the performance of the computational expensive multi-modal approach. In order to retain the local temporal relationship and facilitate visual deep learning models, we first convert time-series data to two-dimensional images by applying the Gramian Angular Field ( GAF) based encoding method. We adopted ResNet18 and multi-scale TRN with BN-Inception as teacher and student network in this study, respectively. A novel loss function, named Distance and Angle-wised Semantic Knowledge loss (DASK), is proposed to mitigate the modality variations between the vision and the sensor domain. Extensive experimental results on UTD-MHAD, MMAct, and Berkeley-MHAD datasets demonstrate the effectiveness and competitiveness of the proposed VSKD model which can deployed on wearable sensors.
翻译:以多模式方法为基础的人类活动识别(HAR)最近显示是为了提高HAR的精确性能。然而,与磨损设备(即智能观察)相关的有限计算资源没有直接支持这种先进方法。为解决这一问题,本研究引入了一个基于多模式方法的终端到终端愿景到传感器知识蒸馏(VSKD)框架。在这个VSKD框架中,测试阶段只需要从可磨损设备上获得时间序列数据,即加速计数据,因此,这一框架不仅将减少对边缘设备的计算需求,而且还将产生一种与计算费用昂贵的多模式方法的性能密切匹配的学习模式。为了保持当地的时间关系,并促进视觉深度学习模式,我们首先将时间序列数据转换为二维图像,采用基于格莱米三角场(GAFAF)的编码方法。我们采用了ResNet18模型和多级TRN,以BN-Incion作为本研究阶段的教师和学生网络。一个叫新式的丢失功能,即移动和移动MHA的移动和移动MA模型,分别用来显示新的丢失、移动和移动MHA模式。