Knowledge of interaction forces during teleoperated robot-assisted surgery could be used to enable force feedback to human operators and evaluate tissue handling skill. However, direct force sensing at the end-effector is challenging because it requires biocompatible, sterilizable, and cost-effective sensors. Vision-based deep learning using convolutional neural networks is a promising approach for providing useful force estimates, though questions remain about generalization to new scenarios and real-time inference. We present a force estimation neural network that uses RGB images and robot state as inputs. Using a self-collected dataset, we compared the network to variants that included only a single input type, and evaluated how they generalized to new viewpoints, workspace positions, materials, and tools. We found that vision-based networks were sensitive to shifts in viewpoints, while state-only networks were robust to changes in workspace. The network with both state and vision inputs had the highest accuracy for an unseen tool, and was moderately robust to changes in viewpoints. Through feature removal studies, we found that using only position features produced better accuracy than using only force features as input. The network with both state and vision inputs outperformed a physics-based baseline model in accuracy. It showed comparable accuracy but faster computation times than a baseline recurrent neural network, making it better suited for real-time applications.
翻译:在远程操作机器人辅助外科手术期间,可以使用互动力量知识知识来向人体操作者提供武力反馈,并评估组织处理技能。然而,在终端效应中直接的力感具有挑战性,因为它需要生物兼容性、可消化和具有成本效益的传感器。 利用进化神经网络进行基于远见的深层次学习,是提供有用武力估计的一个很有希望的方法,尽管对于对新情景的概括性和实时推断仍有疑问。我们提出了一个使用 RGB 图像和机器人状态作为投入的强力估计神经网络。我们利用自收集的数据集,将网络与仅包括单一输入类型的变异器进行比较,并评估它们如何普遍地适用于新观点、工作空间位置、材料和工具。我们发现基于视觉的网络对观点变化十分敏感,而只对国有网络和愿景输入物进行了敏锐度。我们发现,通过特征清除研究,我们发现只有使用的位置特征才比仅使用单一输入的输入型号更准确性强,并评估它们如何普及到新观点、工作空间位置,同时我们发现基于恒定的网络也比常规的精确度更精确度更精确度。