Surgical workflow anticipation can give predictions on what steps to conduct or what instruments to use next, which is an essential part of the computer-assisted intervention system for surgery, e.g. workflow reasoning in robotic surgery. However, current approaches are limited to their insufficient expressive power for relationships between instruments. Hence, we propose a graph representation learning framework to comprehensively represent instrument motions in the surgical workflow anticipation problem. In our proposed graph representation, we maps the bounding box information of instruments to the graph nodes in the consecutive frames and build inter-frame/inter-instrument graph edges to represent the trajectory and interaction of the instruments over time. This design enhances the ability of our network on modeling both the spatial and temporal patterns of surgical instruments and their interactions. In addition, we design a multi-horizon learning strategy to balance the understanding of various horizons indifferent anticipation tasks, which significantly improves the model performance in anticipation with various horizons. Experiments on the Cholec80 dataset demonstrate the performance of our proposed method can exceed the state-of-the-art method based on richer backbones, especially in instrument anticipation (1.27 v.s. 1.48 for inMAE; 1.48 v.s. 2.68 for eMAE). To the best of our knowledge, we are the first to introduce a spatial-temporal graph representation into surgical workflow anticipation.
翻译:外科工作流程预测可以预测采取什么步骤或使用什么工具,这是计算机辅助手术干预系统的一个重要部分,例如机器人外科的工作流程推理;然而,目前的方法仅限于对仪器之间的关系的表达力不足;因此,我们提议了一个图形代表学习框架,以全面代表外科工作流程预测问题中的仪器动作;在我们拟议的图表代表中,我们绘制了仪器与连续框图节点之间的框框信息,并建立了跨框架/跨工具图边,以代表仪器的轨迹和相互作用;这一设计加强了我们网络在外科仪器的空间和时间模式及其互动方面建模的能力;此外,我们设计了一个多正方位学习战略,以平衡对各种前景不同预期任务的理解,从而大大提高了各种前景预期的模型性能;在Cholec80数据集上进行的实验表明,我们拟议方法的性能超过了基于更富的骨架的状态-艺术方法,特别是在仪器预测中,特别是1.27 v. 和1.48 将我们的空间预测结果引入了1.MA 至1.