In minimally invasive surgery, surgical workflow segmentation from video analysis is a well studied topic. The conventional approach defines it as a multi-class classification problem, where individual video frames are attributed a surgical phase label. We introduce a novel reinforcement learning formulation for offline phase transition retrieval. Instead of attempting to classify every video frame, we identify the timestamp of each phase transition. By construction, our model does not produce spurious and noisy phase transitions, but contiguous phase blocks. We investigate two different configurations of this model. The first does not require processing all frames in a video (only <60% and <20% of frames in 2 different applications), while producing results slightly under the state-of-the-art accuracy. The second configuration processes all video frames, and outperforms the state-of-the art at a comparable computational cost. We compare our method against the recent top-performing frame-based approaches TeCNO and Trans-SVNet on the public dataset Cholec80 and also on an in-house dataset of laparoscopic sacrocolpopexy. We perform both a frame-based (accuracy, precision, recall and F1-score) and an event-based (event ratio) evaluation of our algorithms.
翻译:在最小侵入性手术中,通过视频分析进行的外科工作流程分割是一个研究周密的主题。常规方法将它定义为多级分类问题,个人视频框架被归为外科阶段标签。我们为离线阶段过渡检索引入了新型强化学习配方。我们没有尝试对每个视频框架进行分类,而是确定每个阶段过渡的时间戳。通过构建,我们的模型不会产生虚假和吵闹的阶段过渡,而是相邻的阶段块。我们对这一模型的两个不同配置进行了调查。第一个模式不需要在视频中处理所有框架(在两个不同应用程序中,只有 <60%和<20%的框架),同时在最新准确度下略微产生结果。第二个配置处理所有视频框架,并且以可比的计算成本超越了艺术现状。我们比较了我们的方法与最近的基于顶级框架的方法TeCNO和Trans-SVNet在公共数据集Chooleclocloclocopexy上进行对比。我们同时进行基于框架、精确度和精确度的(精确度)和精确度的逻辑分析。