Computer-assisted minimally invasive surgery has great potential in benefiting modern operating theatres. The video data streamed from the endoscope provides rich information to support context-awareness for next-generation intelligent surgical systems. To achieve accurate perception and automatic manipulation during the procedure, learning based technique is a promising way, which enables advanced image analysis and scene understanding in recent years. However, learning such models highly relies on large-scale, high-quality, and multi-task labelled data. This is currently a bottleneck for the topic, as available public dataset is still extremely limited in the field of CAI. In this paper, we present and release the first integrated dataset (named AutoLaparo) with multiple image-based perception tasks to facilitate learning-based automation in hysterectomy surgery. Our AutoLaparo dataset is developed based on full-length videos of entire hysterectomy procedures. Specifically, three different yet highly correlated tasks are formulated in the dataset, including surgical workflow recognition, laparoscope motion prediction, and instrument and key anatomy segmentation. In addition, we provide experimental results with state-of-the-art models as reference benchmarks for further model developments and evaluations on this dataset. The dataset is available at https://autolaparo.github.io.
翻译:从内镜流出的视频数据提供了丰富的信息,以支持下一代智能外科系统的背景意识。为了在程序期间实现准确的认知和自动操作,学习技术是一种有希望的方法,能够使近年来的图像分析和场景的先进理解成为可能。然而,学习这些模型高度依赖大规模、高质量和多任务标签数据。这是目前这个主题的一个瓶颈,因为现有的公共数据集在CAI领域仍然极为有限。我们在本文件中提出并发布第一个综合数据集(名为AutoLaparo),并有多重图像感知任务,以促进子宫切除手术中的学习自动化。我们的AutoLaparo数据集是根据整个子切除手术程序的全长视频开发的。具体地说,在数据集中制定了三项不同但高度相关的任务,包括手术流程识别、大肠镜动作预测、仪器和关键的解剖分解。此外,我们还提供实验结果,包括州-州-拉帕罗(Autoro)-Artaro模型,作为进一步数据开发的基准。我们提供该模型的实验结果,这是用于Mebratosoal模型。