Action segmentation is a challenging task in high-level process analysis, typically performed on video or kinematic data obtained from various sensors. In the context of surgical procedures, action segmentation is critical for workflow analysis algorithms. This work presents two contributions related to action segmentation on kinematic data. Firstly, we introduce two multi-stage architectures, MS-TCN-BiLSTM and MS-TCN-BiGRU, specifically designed for kinematic data. The architectures consist of a prediction generator with intra-stage regularization and Bidirectional LSTM or GRU-based refinement stages. Secondly, we propose two new data augmentation techniques, World Frame Rotation and Horizontal-Flip, which utilize the strong geometric structure of kinematic data to improve algorithm performance and robustness. We evaluate our models on three datasets of surgical suturing tasks: the Variable Tissue Simulation (VTS) Dataset and the newly introduced Bowel Repair Simulation (BRS) Dataset, both of which are open surgery simulation datasets collected by us, as well as the JHU-ISI Gesture and Skill Assessment Working Set (JIGSAWS), a well-known benchmark in robotic surgery. Our methods achieve state-of-the-art performance on all benchmark datasets and establish a strong baseline for the BRS dataset.
翻译:在高级过程分析中,行动分割是一项具有挑战性的任务,通常是在从各种传感器获得的视频或运动数据上进行。在外科手术程序方面,行动分割对于工作流程分析算法至关重要。这项工作提出了与运动数据的行动分割有关的两种贡献。首先,我们引入了两个多阶段结构,即MS-TCN-BilessTM和MS-TCN-BIGRU,专门设计用于运动数据的多阶段结构。这些结构包括一个带有阶段内正规化和双向LSTM或GRU改进阶段的预测生成器。第二,我们提出了两个新的数据增强技术,即世界框架旋转和水平翻转,利用动态数据的强几何结构来改进算法性能和稳健性。我们评估了我们关于手术调整任务的三个数据集的模型:变量模拟(VTS)数据集和新推出的鲍埃尔修复模拟(BRIS)数据集,两者都是我们收集的公开外科模拟数据集,以及JHUIISI Gestration和Skill Serview 工作基准(JIAA)中一项可靠的模型基准数据。</s>