The field of surgical computer vision has undergone considerable breakthroughs in recent years with the rising popularity of deep neural network-based methods. However, standard fully-supervised approaches for training such models require vast amounts of annotated data, imposing a prohibitively high cost; especially in the clinical domain. Self-Supervised Learning (SSL) methods, which have begun to gain traction in the general computer vision community, represent a potential solution to these annotation costs, allowing to learn useful representations from only unlabeled data. Still, the effectiveness of SSL methods in more complex and impactful domains, such as medicine and surgery, remains limited and unexplored. In this work, we address this critical need by investigating four state-of-the-art SSL methods (MoCo v2, SimCLR, DINO, SwAV) in the context of surgical computer vision. We present an extensive analysis of the performance of these methods on the Cholec80 dataset for two fundamental and popular tasks in surgical context understanding, phase recognition and tool presence detection. We examine their parameterization, then their behavior with respect to training data quantities in semi-supervised settings. Correct transfer of these methods to surgery, as described and conducted in this work, leads to substantial performance gains over generic uses of SSL - up to 7.4% on phase recognition and 20% on tool presence detection - as well as state-of-the-art semi-supervised phase recognition approaches by up to 14%. Further results obtained on a highly diverse selection of surgical datasets exhibit strong generalization properties. The code will be made available at https://github.com/CAMMA-public/SelfSupSurg.
翻译:近年来,随着深层神经网络方法的普及程度的提高,外科计算机愿景领域取得了相当大的突破,近年来随着深层神经网络方法的普及程度的提高,外科计算机愿景领域也取得了相当大的突破。然而,对于这类模型的培训,标准、完全监督的方法要求大量附加说明的数据,造成极高的成本,特别是在临床领域。自我监督的学习方法(SSL)已开始在一般计算机愿景群体中获得牵引力,是对这些批注成本的潜在解决方案的一种潜在解决方案,仅能从无标签的数据中学习有用的表述。尽管如此,在医学和外科等更复杂和影响力更大的领域,SLSL方法的有效性仍然有限和未探索。在这项工作中,我们通过调查四种最先进的SLSL方法(MOv2,SMLR,DINO,SVAVA),解决了这一关键的需求。我们对这些方法的绩效进行了广泛的分析,在C80数据集中,在外科背景理解、阶段识别和工具发现两个基本和流行的任务中,SLSL方法的更强,然后在半监督的SLO级数据中,在SLA阶段里,将数据数量上,在SLSAR阶段中,将数据升级数据升级到高度的升级的升级的演示到高度的升级到SLSB阶段,将这些工具的演示到高级的演示到高度的演示到高度的演化,这些工具的演示到高度的演化到高度的演化到高度的演化到高度的阶段。