In this paper, we tackle the new task of video-based Activated Muscle Group Estimation (AMGE) aiming at identifying currently activated muscular regions of humans performing a specific activity. Video-based AMGE is an important yet overlooked problem. To this intent, we provide the MuscleMap136 featuring >15K video clips with 136 different activities and 20 labeled muscle groups. This dataset opens the vistas to multiple video-based applications in sports and rehabilitation medicine. We further complement the main MuscleMap136 dataset, which specifically targets physical exercise, with Muscle-UCF90 and Muscle-HMDB41, which are new variants of the well-known activity recognition benchmarks extended with AMGE annotations. With MuscleMap136, we discover limitations of state-of-the-art architectures for human activity recognition when dealing with multi-label muscle annotations and good generalization to unseen activities is required. To address this, we propose a new multimodal transformer-based model, TransM3E, which surpasses current activity recognition models for AMGE, especially as it comes to dealing with previously unseen activities. The datasets and code will be publicly available at https://github.com/KPeng9510/MuscleMap.
翻译:在本文中,我们处理基于视频的活性肌肉组估计(AMGE)的新任务,目的是确定目前从事特定活动的人类的肌肉活跃区域。基于视频的AMGE是一个重要但被忽视的问题。为此,我们提供以 > 15K视频剪辑为主的MuscleMap136, 包含136种不同活动和20个标记肌肉组。这个数据集打开了体育和康复医学中多种视频应用的视野。我们进一步补充了主要肌肉MuscleMap136数据集,该数据集具体针对身体锻炼,即Muscle-UCF90和Muscle-HMHMDB41, 它们是与AMGE相扩展的众所周知的活动识别基准的新变体。关于MuscleMap136,我们发现在处理多标签肌肉说明和对无形活动进行良好概括时,人类活动识别的最先进的结构存在局限性。为了解决这个问题,我们提议一个新的基于多式联运的变压器模型TransM3E,它超越了AMGE目前的活动识别模式,特别是在处理先前的无形活动时。MAGEMGRAM/MCRAMCRAMCRAMSDR将公开提供数据和代码。</s>