Performance analyses based on videos are commonly used by coaches of athletes in various sports disciplines. In individual sports, these analyses mainly comprise the body posture. This paper focuses on the disciplines of triple, high, and long jump, which require fine-grained locations of the athlete's body. Typical human pose estimation datasets provide only a very limited set of keypoints, which is not sufficient in this case. Therefore, we propose a method to detect arbitrary keypoints on the whole body of the athlete by leveraging the limited set of annotated keypoints and auto-generated segmentation masks of body parts. Evaluations show that our model is capable of detecting keypoints on the head, torso, hands, feet, arms, and legs, including also bent elbows and knees. We analyze and compare different techniques to encode desired keypoints as the model's input and their embedding for the Transformer backbone.
翻译:摘要:基于视频的表现分析常被各种运动领域的教练采用。在个人运动方面,这些分析主要包括身体姿态。本文侧重于三级跳、高弹跳和跳远这些需要详细位置定位的项目。典型的人体姿势估计数据集仅提供一组极限的关键点,这在这种情况下是不够的。因此,我们提出了一种方法来检测运动员全身任意关键点,利用有限的注释关键点和自动生成的身体部位分割掩模。评估表明,我们的模型能够检测头部、躯干、手、脚、手臂和腿部的关键点,包括弯曲的肘部和膝盖。我们分析并比较了不同的技术来将所需关键点编码为模型的输入及其在Transformer骨干中的嵌入。