Current laparoscopic camera motion automation relies on rule-based approaches or only focuses on surgical tools. Imitation Learning (IL) methods could alleviate these shortcomings, but have so far been applied to oversimplified setups. Instead of extracting actions from oversimplified setups, in this work we introduce a method that allows to extract a laparoscope holder's actions from videos of laparoscopic interventions. We synthetically add camera motion to a newly acquired dataset of camera motion free da Vinci surgery image sequences through a novel homography generation algorithm. The synthetic camera motion serves as a supervisory signal for camera motion estimation that is invariant to object and tool motion. We perform an extensive evaluation of state-of-the-art (SOTA) Deep Neural Networks (DNNs) across multiple compute regimes, finding our method transfers from our camera motion free da Vinci surgery dataset to videos of laparoscopic interventions, outperforming classical homography estimation approaches in both, precision by 41%, and runtime on a CPU by 43%.
翻译:目前腹膜照相机运动自动化依赖于基于规则的方法,或者仅仅侧重于外科工具。 模拟学习(IL)方法可以减轻这些缺陷, 但迄今为止已经应用到过度简化的设置中。 我们不是从过度简化的设置中提取动作, 而是在这项工作中引入了一种方法, 允许从腹腔干预的视频中提取腹腔镜持有人的动作。 我们合成地将相机动作添加到新获得的相机运动自由达芬奇手术图像序列数据集中, 通过一个新的同系谱生成算法。 合成相机动作可以作为监视性信号, 用于对反对和工具动作不可变的相机动作进行估计。 我们广泛评估了多种复合系统的最新工艺( SOTA) 深神经网络(DNUS), 发现我们的方法转移了我们从无声调达芬奇手术数据集到腹腔干预的视频, 两种方法都比经典同系估计方法都好, 精确度为41%, 运行在CPU上的时间为43% 。