Articulated hand pose tracking is an underexplored problem that carries the potential for use in an extensive number of applications, especially in the medical domain. With a robust and accurate tracking system on in-vivo surgical videos, the motion dynamics and movement patterns of the hands can be captured and analyzed for rich tasks including skills assessment, training surgical residents, and temporal action recognition. In this work, we propose a novel hand pose estimation model, Res152- CondPose, which improves tracking accuracy by incorporating a hand pose prior into its pose prediction. We show improvements over state-of-the-art methods which provide frame-wise independent predictions, by following a temporally guided approach that effectively leverages past predictions. Additionally, we collect the first dataset, Surgical Hands, that provides multi-instance articulated hand pose annotations for in-vivo videos. Our dataset contains 76 video clips from 28 publicly available surgical videos and over 8.1k annotated hand pose instances. We provide bounding boxes, articulated hand pose annotations, and tracking IDs to enable multi-instance area-based and articulated tracking. When evaluated on Surgical Hands, we show our method outperforms the state-of-the-art method using mean Average Precision (mAP), to measure pose estimation accuracy, and Multiple Object Tracking Accuracy (MOTA), to assess pose tracking performance.
翻译:人工外形跟踪是一个未得到充分探讨的问题,它具有在大量应用中使用的潜力,特别是在医疗领域。如果在现场外科手术录像上有一个强大和准确的跟踪系统,就可以为丰富的任务,包括技能评估、培训外科住院病人和时间动作识别,对手的动作动态和运动模式进行采集和分析,以完成丰富的任务,包括技能评估、培训外科住院病人和时间动作识别。在这项工作中,我们提出了一个新型的外形估算模型,即Res152-CondPose,它通过将手姿势纳入外形预测,改进了跟踪准确性。我们展示了与提供框架智能独立预测的最先进方法相比,我们展示了改进了提供框架智能独立预测的状态方法。此外,我们收集了第一个数据集,即 " 外观手 ",为现场视频视频提供了多功能清晰的说明。我们的数据集包含了76个视频剪辑,来自28个公开提供的外科视频视频和超过8.1美元附加说明的手姿势实例。我们提供了捆绑盒,清晰的手势说明,并跟踪了身份标识,以便进行多功能化的区域和清晰的跟踪。在评估时,我们在评估时,我们展示了“预估评估了“预估”中,我们的方法。我们用“底图案”。