Deep learning-based methods have achieved promising results on surgical instrument segmentation. However, the high computation cost may limit the application of deep models to time-sensitive tasks such as online surgical video analysis for robotic-assisted surgery. Moreover, current methods may still suffer from challenging conditions in surgical images such as various lighting conditions and the presence of blood. We propose a novel Multi-frame Feature Aggregation (MFFA) module to aggregate video frame features temporally and spatially in a recurrent mode. By distributing the computation load of deep feature extraction over sequential frames, we can use a lightweight encoder to reduce the computation costs at each time step. Moreover, public surgical videos usually are not labeled frame by frame, so we develop a method that can randomly synthesize a surgical frame sequence from a single labeled frame to assist network training. We demonstrate that our approach achieves superior performance to corresponding deeper segmentation models on two public surgery datasets.
翻译:深层次的学习方法在外科仪器分割方面已经取得了大有希望的成果。然而,高的计算成本可能会限制深层模型应用于时间敏感的任务,如机器人辅助外科的在线外科视频分析。此外,目前的方法可能仍然受到外科图像中具有挑战性的条件的影响,如各种照明条件和血液的存在。我们提出一个新的多框架特征聚合模块,以综合视频框架在时间和空间上具有经常性特征。通过在相继框架上分配深层特征提取的计算负荷,我们可以使用轻量的编码器来降低每个时间步骤的计算成本。此外,公共外科视频通常不是按框架标出的框架,因此我们开发了一种方法,可以随机合成单一标签框架的外科框架序列,以协助网络培训。我们证明,我们的方法在两个公共外科数据集中实现相应的更深层分解模型的优异性。