Analyses based on the body posture are crucial for top-class athletes in many sports disciplines. If at all, coaches label only the most important keypoints, since manual annotations are very costly. This paper proposes a method to detect arbitrary keypoints on the limbs and skis of professional ski jumpers that requires a few, only partly correct segmentation masks during training. Our model is based on the Vision Transformer architecture with a special design for the input tokens to query for the desired keypoints. Since we use segmentation masks only to generate ground truth labels for the freely selectable keypoints, partly correct segmentation masks are sufficient for our training procedure. Hence, there is no need for costly hand-annotated segmentation masks. We analyze different training techniques for freely selected and standard keypoints, including pseudo labels, and show in our experiments that only a few partly correct segmentation masks are sufficient for learning to detect arbitrary keypoints on limbs and skis.
翻译:基于身体姿势的分析在许多体育学科中对于顶级运动员至关重要。 如果说导师只标出最重要的关键点, 因为手语说明非常昂贵 。 本文建议一种方法来检测专业滑雪滑雪者肢体和滑雪滑板上的任意关键点, 在训练期间需要少数, 只是部分正确的分解面罩。 我们的模型以“ 视觉变形器” 结构为基础, 带有输入符号的特殊设计, 以查询想要的关键点 。 由于我们只使用分解面罩来生成可自由选择的键点的地面真相标签, 部分正确的分解面罩就足够我们的培训程序了。 因此, 我们不需要昂贵的手语分解面罩。 我们分析自由选择和标准关键点的不同培训技巧, 包括假标签, 并在我们的实验中显示, 只有少数部分正确的分解面罩足以学习如何检测肢体和滑雪板上的任意关键点 。