Human Activity Recognition (HAR) on mobile devices has shown to be achievable with lightweight neural models learned from data generated by the user's inertial measurement units (IMUs). Most approaches for instanced-based HAR have used Convolutional Neural Networks (CNNs), Long Short-Term Memory (LSTMs), or a combination of the two to achieve state-of-the-art results with real-time performances. Recently, the Transformers architecture in the language processing domain and then in the vision domain has pushed further the state-of-the-art over classical architectures. However, such Transformers architecture is heavyweight in computing resources, which is not well suited for embedded applications of HAR that can be found in the pervasive computing domain. In this study, we present Human Activity Recognition Transformer (HART), a lightweight, sensor-wise transformer architecture that has been specifically adapted to the domain of the IMUs embedded on mobile devices. Our experiments on HAR tasks with several publicly available datasets show that HART uses fewer FLoating-point Operations Per Second (FLOPS) and parameters while outperforming current state-of-the-art results. Furthermore, we present evaluations across various architectures on their performances in heterogeneous environments and show that our models can better generalize on different sensing devices or on-body positions.
翻译:移动设备上的人类活动识别(HAR)显示,通过从用户惯性测量单位(IMUs)生成的数据中获取的轻量量级神经模型,可以实现移动设备上的人类活动识别(HAR) 。 多数方法,例如基于基准的HAR使用进化神经网络(CNNs),长期短期内存(LSTMs),或两者结合,以实时性能实现最先进的结果。最近,语言处理域和视觉域的变异器结构进一步推进了古典结构中的最新技术。然而,这种变异器结构在计算资源方面重量过重,不适合HAR的嵌入应用,在普遍计算域中可以找到。在本项研究中,我们介绍人类活动识别变异器(HART),这是一个特别适应移动设备嵌入的IMUs域的轻质、感应感知变变器结构。 我们用几个公开数据集对HAR的任务进行的实验显示,HART使用较少的Floate-point Offer II (FLOPS) 和参数,而我们目前在各种模型中表现更精确的模型上显示我们目前各种的模型的状态和变异式环境的成绩。