Surgical phase recognition is a fundamental task in computer-assisted surgery systems. Most existing works require expensive frame-wise annotations, which is very time-consuming. In this paper, we introduce timestamp supervision to surgical phase recognition for the first time, which only requires randomly labeling one frame for each phase in a video. With timestamp supervision, current methods in natural videos aim to generate pseudo labels of full frames. However, due to the surgical videos containing ambiguous boundaries, these methods would generate many noisy and inconsistent pseudo labels, leading to limited performance. We argue that less is more in surgical phase recognition,~\ie, less but discriminative pseudo labels outperform full but ambiguous frames. To this end, we propose a novel method called uncertainty-aware temporal diffusion to generate trustworthy pseudo labels. Our approach evaluates the confidence of generated pseudo labels based on uncertainty estimation. Then, we treat the annotated frames as anchors and make pseudo labels diffuse to both sides, starting from anchors and stopping at the high-uncertainty frames. In this way, our proposed method can generate contiguous confident pseudo labels while discarding the uncertain ones. Extensive experiments demonstrate that our method not only significantly save annotation cost, but also outperforms fully supervised methods. Moreover, our proposed approach can be used to clean noisy labels near boundaries and improve the performance of the current surgical phase recognition methods.
翻译:外科阶段识别是计算机辅助外科手术系统的一项基本任务。 大部分现有工程都需要昂贵的框架背景说明, 这非常耗时。 在本文中, 我们首次引入了手术阶段识别的时间戳监督, 第一次只需要随机在视频中为每个阶段贴上一个框架标签。 使用时间戳监督, 自然视频中目前的方法旨在生成假的全框标签。 但是, 由于包含模糊界限的外科视频, 这些方法将产生许多噪音和不一致的假标签, 导致性能有限。 我们争论说, 在外科阶段识别方面, 更少的是, ⁇ ie, 较少但歧视性的假标签比完整但模糊的框架要少。 为此, 我们提出一种新的方法, 叫做不确定性的暂时扩散, 以生成可靠的假标签。 我们的方法根据不确定性估计来评估伪标签的信心。 然后, 我们把这些附加说明的框作为锚和假标签向两边传播, 从锚开始, 停止高不确定性的框框。 我们提议的方法只能产生连结的假标签, 而不是在接近当前外科性标之前, 彻底的升级的试验 。