This paper presents the design and results of the "PEg TRAnsfert Workflow recognition" (PETRAW) challenge whose objective was to develop surgical workflow recognition methods based on one or several modalities, among video, kinematic, and segmentation data, in order to study their added value. The PETRAW challenge provided a data set of 150 peg transfer sequences performed on a virtual simulator. This data set was composed of videos, kinematics, semantic segmentation, and workflow annotations which described the sequences at three different granularity levels: phase, step, and activity. Five tasks were proposed to the participants: three of them were related to the recognition of all granularities with one of the available modalities, while the others addressed the recognition with a combination of modalities. Average application-dependent balanced accuracy (AD-Accuracy) was used as evaluation metric to take unbalanced classes into account and because it is more clinically relevant than a frame-by-frame score. Seven teams participated in at least one task and four of them in all tasks. Best results are obtained with the use of the video and the kinematics data with an AD-Accuracy between 93% and 90% for the four teams who participated in all tasks. The improvement between video/kinematic-based methods and the uni-modality ones was significant for all of the teams. However, the difference in testing execution time between the video/kinematic-based and the kinematic-based methods has to be taken into consideration. Is it relevant to spend 20 to 200 times more computing time for less than 3% of improvement? The PETRAW data set is publicly available at www.synapse.org/PETRAW to encourage further research in surgical workflow recognition.
翻译:本文展示了“ PEG Transfert Workfert Forlow 识别” (PETRAW) 挑战的设计与结果, 挑战的目标是在视频、 运动、 和分层数据中, 以一个或几个模式为基础, 开发外科工作流程识别方法, 研究其附加值。 PETRAW 挑战提供了一套数据集, 其中包括在虚拟模拟器上执行的150个连接传输序列。 该数据集由视频、 运动数学、 语义分解和工作流程说明组成, 描述三个不同颗粒度层次的序列: 阶段、 步骤和活动。 向参与者提出了五项任务: 其中三项任务涉及以一个或数种模式, 以视频、 运动、 运动、 运动、 运动、 运动、 运动、 运动、 平均应用平衡的准确度( AD- Acurenceality) 用作评估指标, 因为它比基于框架的分级分数更具有临床相关性。 七个团队至少参与了一项任务, 其中四项任务。 最佳的结果是, 在视频、 视频和运动、 ET 时间、 时间、 测试、 测试、 测试、 测试、 测试、 测试、 测试、 测试、 测试、 测试、 测试、 测试、 测试、 测试、 测试、 测试、 系统、 测试、 测试、 测试、 测试、 测试、 等组、 等组、 分数组、 分分数组之间、 分数组、 分数组、 分数组、 分数组、 分数组、分数组、分、分、分、分、分、分、分、分、分、分、分、分、分、分、分、分、分、分、分、分、分、分、分、分、分、分、分、分、分、分、分、分、分、分、分、分、分、分、分、分、分、分、分、分、分、分、分、分、分、分、分、分、分、分、分、分、分、分、分、分、分、分、分、分、分、分、分、分、分、分、分、分、分、