Imitation learning (IL) has shown immense promise in enabling autonomous dexterous manipulation, including learning surgical tasks. To fully unlock the potential of IL for surgery, access to clinical datasets is needed, which unfortunately lack the kinematic data required for current IL approaches. A promising source of large-scale surgical demonstrations is monocular surgical videos available online, making monocular pose estimation a crucial step toward enabling large-scale robot learning. Toward this end, we propose SurgiPose, a differentiable rendering based approach to estimate kinematic information from monocular surgical videos, eliminating the need for direct access to ground truth kinematics. Our method infers tool trajectories and joint angles by optimizing tool pose parameters to minimize the discrepancy between rendered and real images. To evaluate the effectiveness of our approach, we conduct experiments on two robotic surgical tasks: tissue lifting and needle pickup, using the da Vinci Research Kit Si (dVRK Si). We train imitation learning policies with both ground truth measured kinematics and estimated kinematics from video and compare their performance. Our results show that policies trained on estimated kinematics achieve comparable success rates to those trained on ground truth data, demonstrating the feasibility of using monocular video based kinematic estimation for surgical robot learning. By enabling kinematic estimation from monocular surgical videos, our work lays the foundation for large scale learning of autonomous surgical policies from online surgical data.
翻译:模仿学习在实现自主灵巧操作(包括学习手术任务)方面展现出巨大潜力。要充分释放模仿学习在外科手术中的应用潜力,需要获取临床数据集,但这些数据集往往缺乏当前模仿学习方法所需的运动学数据。大规模手术演示的一个有前景的来源是在线可用的单目手术视频,这使得单目姿态估计成为实现大规模机器人学习的关键一步。为此,我们提出了SurgiPose,一种基于可微分渲染的方法,用于从单目手术视频中估计运动学信息,从而无需直接访问真实运动学数据。我们的方法通过优化工具姿态参数以最小化渲染图像与真实图像之间的差异,来推断工具轨迹和关节角度。为了评估我们方法的有效性,我们使用da Vinci Research Kit Si(dVRK Si)在两个机器人手术任务(组织提拉和针拾取)上进行了实验。我们分别使用真实测量的运动学数据和从视频估计的运动学数据来训练模仿学习策略,并比较它们的性能。我们的结果表明,基于估计运动学数据训练的策略达到了与基于真实数据训练的策略相当的成功率,这证明了使用基于单目视频的运动学估计进行手术机器人学习的可行性。通过实现从单目手术视频进行运动学估计,我们的工作为从在线手术数据大规模学习自主手术策略奠定了基础。