Reconstructing the scene of robotic surgery from the stereo endoscopic video is an important and promising topic in surgical data science, which potentially supports many applications such as surgical visual perception, robotic surgery education and intra-operative context awareness. However, current methods are mostly restricted to reconstructing static anatomy assuming no tissue deformation, tool occlusion and de-occlusion, and camera movement. However, these assumptions are not always satisfied in minimal invasive robotic surgeries. In this work, we present an efficient reconstruction pipeline for highly dynamic surgical scenes that runs at 28 fps. Specifically, we design a transformer-based stereoscopic depth perception for efficient depth estimation and a light-weight tool segmentor to handle tool occlusion. After that, a dynamic reconstruction algorithm which can estimate the tissue deformation and camera movement, and aggregate the information over time is proposed for surgical scene reconstruction. We evaluate the proposed pipeline on two datasets, the public Hamlyn Centre Endoscopic Video Dataset and our in-house DaVinci robotic surgery dataset. The results demonstrate that our method can recover the scene obstructed by the surgical tool and handle the movement of camera in realistic surgical scenarios effectively at real-time speed.
翻译:从立体内分层视频中重建机器人手术场景是外科数据科学中一个重要的、有希望的话题,它可能支持许多应用,例如外科直视、机器人外科教育和手术内环境意识,然而,目前的方法主要局限于重建静态解剖,假设没有组织畸形、工具封闭和隔离以及相机移动;然而,这些假设在最低入侵机器人手术中并不总是得到满足。在这项工作中,我们为28英尺高动态外科手术场景提供了一个高效重建管道。具体地说,我们设计了一个基于变压器的立体深度感,以便进行高效深度估测,并设计一个轻量工具分割处理工具封闭问题。之后,提出了动态的重建算法,可以估计组织畸形和相机移动情况,并汇总一段时间内的信息,用于手术现场重建。我们评价了两个数据集的拟议管道,即公共Hamlyn Enosco摄像视频数据集和我们内部的DaVinci机器人外科手术数据集。结果显示,我们的方法可以恢复被手术工具有效阻断的现场,并有效地处理现实的摄影机场景的移动情况。