Surgical activity recognition and prediction can help provide important context in many Robot-Assisted Surgery (RAS) applications, for example, surgical progress monitoring and estimation, surgical skill evaluation, and shared control strategies during teleoperation. Transformer models were first developed for Natural Language Processing (NLP) to model word sequences and soon the method gained popularity for general sequence modeling tasks. In this paper, we propose the novel use of a Transformer model for three tasks: gesture recognition, gesture prediction, and trajectory prediction during RAS. We modify the original Transformer architecture to be able to generate the current gesture sequence, future gesture sequence, and future trajectory sequence estimations using only the current kinematic data of the surgical robot end-effectors. We evaluate our proposed models on the JHU-ISI Gesture and Skill Assessment Working Set (JIGSAWS) and use Leave-One-User-Out (LOUO) cross-validation to ensure the generalizability of our results. Our models achieve up to 89.3\% gesture recognition accuracy, 84.6\% gesture prediction accuracy (1 second ahead) and 2.71mm trajectory prediction error (1 second ahead). Our models are comparable to and able to outperform state-of-the-art methods while using only the kinematic data channel. This approach can enable near-real time surgical activity recognition and prediction.
翻译:外科活动识别和预测有助于在许多机器人辅助外科应用中提供重要背景,例如手术进展监测和估计、外科技能评估和在远程操作期间的共用控制战略。首先为自然语言处理开发了变异模型,以模拟文字序列,很快,在一般序列建模任务中,该方法获得流行。在本文件中,我们提议对三种任务采用新颖的变异模型:姿态识别、手势预测和RAS期间的轨迹预测。我们修改原变异器结构,以便仅利用外科机器人末效器的当前运动数据,产生当前动作序列、未来动作序列和未来轨迹序列估计。我们评估了我们提议的关于JHU-ISI Gesture and Skill Reservement World的模型(JIGSAWS),并使用休假一用户输出(LOOO)交叉校准,以确保我们结果的可普遍适用性。我们的模式达到89.3 ⁇ 姿态识别准确度,84.6 ⁇ 手势预测准确度(先行1秒)和2.71毫米轨迹测序预测错误(先行能识别数据模型)。我们只能进行模拟的确认。