In this paper, we tackle the new task of video-based Activated Muscle Group Estimation (AMGE) aiming at identifying active muscle regions during physical activity. To this intent, we provide the MuscleMap136 dataset featuring >15K video clips with 136 different activities and 20 labeled muscle groups. This dataset opens the vistas to multiple video-based applications in sports and rehabilitation medicine. We further complement the main MuscleMap136 dataset, which specifically targets physical exercise, with Muscle-UCF90 and Muscle-HMDB41, which are new variants of the well-known activity recognition benchmarks extended with AMGE annotations. To make the AMGE model applicable in real-life situations, it is crucial to ensure that the model can generalize well to types of physical activities not present during training and involving new combinations of activated muscles. To achieve this, our benchmark also covers an evaluation setting where the model is exposed to activity types excluded from the training set. Our experiments reveal that generalizability of existing architectures adapted for the AMGE task remains a challenge. Therefore, we also propose a new approach, TransM3E, which employs a transformer-based model with cross-modal multi-label knowledge distillation and surpasses all popular video classification models when dealing with both, previously seen and new types of physical activities. The datasets and code will be publicly available at https://github.com/KPeng9510/MuscleMap.
翻译:本文针对视频中的激活肌肉群估计(AMGE)这一新任务,旨在识别身体活动期间的活动肌肉区域。为此,我们提供了MuscleMap136数据集,其中包含>15K视频剪辑,包括136种不同的活动和20个标记的肌肉群。该数据集为体育和康复医学中的多种基于视频的应用开辟了新的前景。我们进一步补充了主要的MuscleMap136数据集,该数据集专门针对身体锻炼,还添加了新的变量Muscle-UCF90和Muscle-HMDB41, 它们是众所周知的活动识别基准的新变体,扩展了AMGE注释。为了使AMGE模型能够在现实生活中适用,确保模型能够很好地推广到训练期间不存在的类型的身体活动,并涉及新的激活肌肉组合也是至关重要的。为了实现这一点,我们的基准还涵盖了一个评估设置,其中模型暴露于从训练集中排除的活动类型。我们的实验表明,为AMGE任务而调整的现有架构的泛化能力仍然是一个挑战。因此,我们还提出了一种新方法TransM3E,它使用基于Transformer的模型,并结合跨模态多标签知识蒸馏,在处理已见过和新的体育活动类型时超越了所有流行的视频分类模型。数据集和代码将在https://github.com/KPeng9510/MuscleMap上公开提供。