Previous work has demonstrated that virtual accelerometry data, extracted from videos using cross-modality transfer approaches like IMUTube, is beneficial for training complex and effective human activity recognition (HAR) models. Systems like IMUTube were originally designed to cover activities that are based on substantial body (part) movements. Yet, life is complex, and a range of activities of daily living is based on only rather subtle movements, which bears the question to what extent systems like IMUTube are of value also for fine-grained HAR, i.e., When does IMUTube break? In this work we first introduce a measure to quantitatively assess the subtlety of human movements that are underlying activities of interest--the motion subtlety index (MSI)--which captures local pixel movements and pose changes in the vicinity of target virtual sensor locations, and correlate it to the eventual activity recognition accuracy. We then perform a "stress-test" on IMUTube and explore for which activities with underlying subtle movements a cross-modality transfer approach works, and for which not. As such, the work presented in this paper allows us to map out the landscape for IMUTube applications in practical scenarios.
翻译:先前的工作已经表明,使用IMUTUBE等跨模式传输方法从视频中提取的虚拟进化测量数据有利于培训复杂和有效的人类活动识别模型(HAR)。IMUTUBE这样的系统最初的设计是为了涵盖基于大量身体(部分)运动的活动。然而,生活是复杂的,而一系列日常生活活动只是基于相当微妙的移动,这带来了一个问题,即IMUTUBE这样的系统在多大程度上也具有精细的HAR(即IMUTUBE何时中断)的价值?在这个工作中,我们首先引入了一项措施,从数量上评估人类运动的微妙性,这些运动是利益-运动微妙指数(MSI)活动的基础,它捕捉到当地的像素运动,在目标虚拟传感器地点附近带来变化,并与最终的活动识别准确性相联系。我们随后在IMUTUBE进行“压力测试”并探索那些具有潜在微妙移动的细微转移方法的活动,而不是为这些变化而我们首先引入了一项措施。在本文中介绍的实际地平面图中的工作允许我们绘制I的地图。