In recent times there is a growing development of video based applications for surgical purposes. Part of these applications can work offline after the end of the procedure, other applications must react immediately. However, there are cases where the response should be done during the procedure but some delay is acceptable. In the literature, the online-offline performance gap is known. Our goal in this study was to learn the performance-delay trade-off and design an MS-TCN++-based algorithm that can utilize this trade-off. To this aim, we used our open surgery simulation data-set containing 96 videos of 24 participants that perform a suturing task on a variable tissue simulator. In this study, we used video data captured from the side view. The Networks were trained to identify the performed surgical gestures. The naive approach is to reduce the MS-TCN++ depth, as a result, the receptive field is reduced, and also the number of required future frames is also reduced. We showed that this method is sub-optimal, mainly in the small delay cases. The second method was to limit the accessible future in each temporal convolution. This way, we have flexibility in the network design and as a result, we achieve significantly better performance than in the naive approach.
翻译:近来,用于外科手术的视频应用程序发展日益壮大,部分应用可以在程序结束后脱线运行,其他应用必须立即反应。然而,有些情况下,在程序期间应当做出回应,但有些延误是可以接受的。在文献中,在线脱线性能差距是已知的。我们的研究目标是学习性能-延迟取舍,设计一个能够利用这种取舍的基于MS-TCN+++的算法。为此,我们使用了公开外科模拟数据集,其中包含了在可变组织模拟器上进行调试任务的24名参与者的96个视频。在这个研究中,我们使用了从侧视图获取的视频数据。这些网络经过培训,以确定所执行的外科手术动作。天真的方法是降低MS-TCN+++的深度,这样就缩小了可接受的字段,并减少了所需的未来框架的数量。我们显示,这种方法不尽理想,主要是在小的延迟案例中。第二种方法是限制每个可达时空变变式的未来。这样,我们在网络设计中就有了一种显著的灵活性。