Current laparoscopic camera motion automation relies on rule-based approaches or only focuses on surgical tools. Imitation Learning (IL) methods could alleviate these shortcomings, but have so far been applied to oversimplified setups. Instead of extracting actions from oversimplified setups, in this work we introduce a method that allows to extract a laparoscope holder's actions from videos of laparoscopic interventions. We synthetically add camera motion to a newly acquired dataset of camera motion free da Vinci surgery image sequences through the introduction of a novel homography generation algorithm. The synthetic camera motion serves as a supervisory signal for camera motion estimation that is invariant to object and tool motion. We perform an extensive evaluation of state-of-the-art (SOTA) Deep Neural Networks (DNNs) across multiple compute regimes, finding our method transfers from our camera motion free da Vinci surgery dataset to videos of laparoscopic interventions, outperforming classical homography estimation approaches in both, precision by 41%, and runtime on a CPU by 43%.
翻译:目前腹膜照相机的自动操作依靠基于规则的方法,或者仅仅侧重于外科工具。 模拟学习(IL)方法可以减轻这些缺陷, 但迄今为止已经应用到过度简化的设置中。 我们不是从过度简化的设置中提取动作, 而是在这项工作中引入了一种方法, 允许从腹腔干预的视频中提取腹腔镜持有人的动作。 我们合成地将相机运动添加到新近获得的相机运动自由达芬奇手术图像序列数据集中, 通过引入新的同系物生成算法。 合成相机动作作为监视性信号, 用于对反对和工具动作的不易变的相机动作估计。 我们广泛评估了多功能化的状态( SOTA) 深神经网络(DNNS ), 找到我们从我们的相机运动自由达芬奇手术数据集中提取的方法传输到大腹膜干预的视频, 两种方法都优于经典同系谱估计方法, 精确率为41%, 在CPU上运行时间为43% 。