Phase recognition plays an essential role for surgical workflow analysis in computer assisted intervention. Transformer, originally proposed for sequential data modeling in natural language processing, has been successfully applied to surgical phase recognition. Existing works based on transformer mainly focus on modeling attention dependency, without introducing auto-regression. In this work, an Auto-Regressive Surgical Transformer, referred as ARST, is first proposed for on-line surgical phase recognition from laparoscopic videos, modeling the inter-phase correlation implicitly by conditional probability distribution. To reduce inference bias and to enhance phase consistency, we further develop a consistency constraint inference strategy based on auto-regression. We conduct comprehensive validations on a well-known public dataset Cholec80. Experimental results show that our method outperforms the state-of-the-art methods both quantitatively and qualitatively, and achieves an inference rate of 66 frames per second (fps).
翻译:阶段识别在计算机辅助干预中对外科工作流程分析起着关键作用。 最初在自然语言处理中为顺序数据建模而提议的变换器已经成功地应用于外科手术阶段识别。 基于变压器的现有工程主要侧重于对注意依赖的建模,而没有引入自动递减。在这项工作中,称为ARST的自动递减外科变异器首先被提议从腹腔外科视频中进行在线外科阶段识别,通过有条件概率分布暗中模拟各阶段之间的相关性。为了减少推论偏差并增强阶段一致性,我们进一步制定了基于自动回归的一致性约束推理战略。我们对众所周知的公众数据集Cholec80进行全面验证。实验结果显示,我们的方法在定量和定性上都超越了最先进的方法,并实现了每秒66个框架(fps)的推理率。