Estimating a scene reconstruction and the camera motion from in-body videos is challenging due to several factors, e.g. the deformation of in-body cavities or the lack of texture. In this paper we present Endo-Depth-and-Motion, a pipeline that estimates the 6-degrees-of-freedom camera pose and dense 3D scene models from monocular endoscopic videos. Our approach leverages recent advances in self-supervised depth networks to generate pseudo-RGBD frames, then tracks the camera pose using photometric residuals and fuses the registered depth maps in a volumetric representation. We present an extensive experimental evaluation in the public dataset Hamlyn, showing high-quality results and comparisons against relevant baselines. We also release all models and code for future comparisons.
翻译:由于若干因素,例如身体内孔变形或缺乏纹理,对现场重建的估测和体内视频的摄像运动具有挑战性。在本文件中,我们介绍Endo-Deph-Motion,这是一个估计6度自由照相机的管道,以及单外内分层视频的密集3D场景模型。我们的方法利用自我监督的深度网络的最新进展生成假RGBD框架,然后利用光度残留物跟踪摄影机的成形,用体积表示将已登记的深度地图连接起来。我们在公共数据集Hamlyn上提出了广泛的实验性评价,显示高质量的结果和与相关基线的比较。我们还公布了所有模型和代码,供今后进行比较。