然而它却在移动:从一般运动中学习到从YouTube视频中生成IMU数据 (Yet it moves: Learning from Generic Motions to Generate IMU data from YouTube videos)

Human activity recognition (HAR) using wearable sensors has benefited much less from recent advances in Machine Learning than fields such as computer vision and natural language processing. This is to a large extent due to the lack of large scale repositories of labeled training data. In our research we aim to facilitate the use of online videos, which exists in ample quantity for most activities and are much easier to label than sensor data, to simulate labeled wearable motion sensor data. In previous work we already demonstrate some preliminary results in this direction focusing on very simple, activity specific simulation models and a single sensor modality (acceleration norm)\cite{10.1145/3341162.3345590}. In this paper we show how we can train a regression model on generic motions for both accelerometer and gyro signals and then apply it to videos of the target activities to generate synthetic IMU data (acceleration and gyro norms) that can be used to train and/or improve HAR models. We demonstrate that systems trained on simulated data generated by our regression model can come to within around 10% of the mean F1 score of a system trained on real sensor data. Furthermore we show that by either including a small amount of real sensor data for model calibration or simply leveraging the fact that (in general) we can easily generate much more simulated data from video than we can collect in terms of real sensor data the advantage of real sensor data can be eventually equalized.

翻译：使用可磨损的传感器的人的活动识别(HAR)比计算机视觉和自然语言处理等领域近期在机器学习方面取得的进展要少得多。这在很大程度上是因为缺乏大型的标签培训数据储存库。在我们的研究中,我们的目标是促进使用在线视频,这些视频对大多数活动来说数量充足,比传感器数据更容易贴标签,模拟标签标签的可磨损运动传感器数据。在以往的工作中,我们已经展示了这方面的初步结果,侧重于非常简单、活动特定模拟模型和单一传感器模式(加速规范)\cite{10.1145/33414341.33153455990}。在本文中,我们展示了如何在通用动作上对加速仪和陀螺仪信号进行回归模型的培训,然后将其用于目标活动的视频,以生成合成的IMU数据(加速和陀螺仪规范),用于培训和/或改进HAR模型。我们通过回归模型生成的模拟数据的系统,可以在我们所培训的F1中10%左右的中进行模拟优势测试,最终从真实感官数据中收集更多的数据。我们通过模拟传感器数据来显示,我们可以通过更精确的感官数据来收集更多的数据。我们通过一般的感官数据。我们可以显示,或者通过更精确的数据可以产生数据。我们通过更精确的数据。我们能够更精确的数据。