Manual labeling of gestures in robot-assisted surgery is labor intensive, prone to errors, and requires expertise or training. We propose a method for automated and explainable generation of gesture transcripts that leverages the abundance of data for image segmentation. Surgical context is detected using segmentation masks by examining the distances and intersections between the tools and objects. Next, context labels are translated into gesture transcripts using knowledge-based Finite State Machine (FSM) and data-driven Long Short Term Memory (LSTM) models. We evaluate the performance of each stage of our method by comparing the results with the ground truth segmentation masks, the consensus context labels, and the gesture labels in the JIGSAWS dataset. Our results show that our segmentation models achieve state-of-the-art performance in recognizing needle and thread in Suturing and we can automatically detect important surgical states with high agreement with crowd-sourced labels (e.g., contact between graspers and objects in Suturing). We also find that the FSM models are more robust to poor segmentation and labeling performance than LSTMs. Our proposed method can significantly shorten the gesture labeling process (~2.8 times).
翻译:在机器人辅助外科手术中,手动标记是劳动密集型的,容易出错,需要专门知识或培训。我们提出一种自动和可解释的手动记录誊本生成方法,利用大量数据进行图像分割;通过检查工具和物体之间的距离和交叉点,用分解面面罩检测外壳;然后,使用基于知识的精密国家机器(FSM)和数据驱动的长期短期内存(LSTM)模型,将上下文标签转化为手动记录;我们通过将结果与地面真相分割面罩、共识环境标签和JIGSAWS数据集中的手势标签进行比较,评估我们方法的每个阶段的性能。我们的结果显示,我们的分解模型在识别Suturing中的针线方面达到了最先进的性能,我们可以在与众源标签高度一致的情况下自动检测重要的外科状态(例如,在苏图林的捕捉者与物体之间的接触)。我们还发现,FSMM模型比LSTMS的分解和标签性能大大缩短了差分解和标签过程。我们提议的方法可以大大缩短了。</s>