The success or failure of modern computer-assisted surgery procedures hinges on the precise six-degree-of-freedom (6DoF) position and orientation (pose) estimation of tracked instruments and tissue. In this paper, we present HMD-EgoPose, a single-shot learning-based approach to hand and object pose estimation and demonstrate state-of-the-art performance on a benchmark dataset for monocular red-green-blue (RGB) 6DoF marker-less hand and surgical instrument pose tracking. Further, we reveal the capacity of our HMD-EgoPose framework for performant 6DoF pose estimation on a commercially available optical see-through head-mounted display (OST-HMD) through a low-latency streaming approach. Our framework utilized an efficient convolutional neural network (CNN) backbone for multi-scale feature extraction and a set of subnetworks to jointly learn the 6DoF pose representation of the rigid surgical drill instrument and the grasping orientation of the hand of a user. To make our approach accessible to a commercially available OST-HMD, the Microsoft HoloLens 2, we created a pipeline for low-latency video and data communication with a high-performance computing workstation capable of optimized network inference. HMD-EgoPose outperformed current state-of-the-art approaches on a benchmark dataset for surgical tool pose estimation, achieving an average tool 3D vertex error of 11.0 mm on real data and furthering the progress towards a clinically viable marker-free tracking strategy. Through our low-latency streaming approach, we achieved a round trip latency of 199.1 ms for pose estimation and augmented visualization of the tracked model when integrated with the OST-HMD. Our single-shot learned approach was robust to occlusion and complex surfaces and improved on current state-of-the-art approaches to marker-less tool and hand pose estimation.
翻译:现代计算机辅助外科手术程序的成败取决于精确的六度自由(6DoF)定位和定向(应用)估计跟踪仪表和组织。在本文件中,我们通过低延迟流流方法展示了HMD-EgoPose,这是一个单发学习式的手表和物体学习方法,在单向红蓝色(RGB) 6DoF无标记手表和外科手术工具的基准数据集中展示了最先进的性能。此外,我们展示了我们的HMD-EgoPose 性能6度自由度(6DoF)定位和定向框架的能力,对商业上可用的光学透视仪显示器(OST-HMD ) 进行了估计。我们的框架利用了高效的革命神经网络骨干来进行多级地采掘,并用一组子网络来共同学习6度硬性外科手术钻工具的表示方式以及当用户手握方向时,让我们的低频流流-直径直径方法可以进入商业上的OST-HDMD, 微轨道2,我们创建了一条直径直径直径直径直径的直径直径直径的直径直径直径直径网络,一个当前数据库数据,我们的一个当前数据库数据流-直径智能工具,在高端数据库中,我们的一个智能智能数据系统流-直径直径定位工具,一个高位数据库数据系统。