Context-aware decision support in the operating room can foster surgical safety and efficiency by leveraging real-time feedback from surgical workflow analysis. Most existing works recognize surgical activities at a coarse-grained level, such as phases, steps or events, leaving out fine-grained interaction details about the surgical activity; yet those are needed for more helpful AI assistance in the operating room. Recognizing surgical actions as triplets of <instrument, verb, target> combination delivers comprehensive details about the activities taking place in surgical videos. This paper presents CholecTriplet2021: an endoscopic vision challenge organized at MICCAI 2021 for the recognition of surgical action triplets in laparoscopic videos. The challenge granted private access to the large-scale CholecT50 dataset, which is annotated with action triplet information. In this paper, we present the challenge setup and assessment of the state-of-the-art deep learning methods proposed by the participants during the challenge. A total of 4 baseline methods from the challenge organizers and 19 new deep learning algorithms by competing teams are presented to recognize surgical action triplets directly from surgical videos, achieving mean average precision (mAP) ranging from 4.2% to 38.1%. This study also analyzes the significance of the results obtained by the presented approaches, performs a thorough methodological comparison between them, in-depth result analysis, and proposes a novel ensemble method for enhanced recognition. Our analysis shows that surgical workflow analysis is not yet solved, and also highlights interesting directions for future research on fine-grained surgical activity recognition which is of utmost importance for the development of AI in surgery.
翻译:操作室的环境意识决策支持可以通过利用外科工作流程分析的实时反馈,促进外科安全和效率。大多数现有作品承认外科活动,如阶段、步骤或事件等粗糙的外科活动,不提供精细的外科活动互动细节;但对于操作室更有用的AI协助而言,还需要这些细节。认识到外科手术行动是“工具、动词、目标”组合的三胞胎,可以提供外科视频中活动的全面细节。本文介绍了CholecTriplet 2021:2021:内科视觉挑战,由MICCAI 2021组织,以确认腹腔科视频中的外科行动三重方向。 允许私人访问大型CholecT50数据集的细微互动细节,该难题附有行动三重信息。 在本文中,对参与者在挑战视频中提议的最先进的深层学习方法的设置和评估。 挑战组织者共有4种直线方法,以及竞争团队19种新的深层学习算法,用来识别腹外科手术动作的三重方向,从38码分析显示我们最高级的外科分析结果。