Surgical tool detection in minimally invasive surgery is an essential part of computer-assisted interventions. Current approaches are mostly based on supervised methods which require large fully labeled data to train supervised models and suffer from pseudo label bias because of class imbalance issues. However large image datasets with bounding box annotations are often scarcely available. Semi-supervised learning (SSL) has recently emerged as a means for training large models using only a modest amount of annotated data; apart from reducing the annotation cost. SSL has also shown promise to produce models that are more robust and generalizable. Therefore, in this paper we introduce a semi-supervised learning (SSL) framework in surgical tool detection paradigm which aims to mitigate the scarcity of training data and the data imbalance through a knowledge distillation approach. In the proposed work, we train a model with labeled data which initialises the Teacher-Student joint learning, where the Student is trained on Teacher-generated pseudo labels from unlabeled data. We propose a multi-class distance with a margin based classification loss function in the region-of-interest head of the detector to effectively segregate foreground classes from background region. Our results on m2cai16-tool-locations dataset indicate the superiority of our approach on different supervised data settings (1%, 2%, 5%, 10% of annotated data) where our model achieves overall improvements of 8%, 12% and 27% in mAP (on 1% labeled data) over the state-of-the-art SSL methods and a fully supervised baseline, respectively. The code is available at https://github.com/Mansoor-at/Semi-supervised-surgical-tool-det
翻译:在最小入侵性手术中,外科工具检测是计算机辅助干预的基本部分。目前的方法大多基于监管方法,这些方法需要大量全标签数据来培训受监督的模型,并且由于阶级不平衡问题而存在伪标签偏差。然而,大量带有捆绑框说明的图像数据集往往很少可用。最近出现了半监督学习(SSL),作为培训大型模型的手段,仅使用少量附加说明的数据;除了降低批注成本之外,SSL还显示出了制作更加强大和可概括的模型的希望。因此,在本文件中,我们在外科工具检测模式中引入了半监督学习(SSL)框架,目的是通过知识蒸馏方法减少培训数据的稀缺和数据不平衡。在拟议工作中,我们用标签数据模型来培训学生,从无标签数据模型中培训教师生成的假标签。我们提议多级距离,在区域利益首级检测点中以差值为基础进行分类损失功能(SSL),在 8-% 常规数据定位区域中,Sloverial 16 将我们的数据定位为10个区域,在% 在线数据背景背景中,Slosural-laforal-lafor-al-lagielation latial agation a laforation a laus