Recording surgery in operating rooms is an essential task for education and evaluation of medical treatment. However, recording the desired targets, such as the surgery field, surgical tools, or doctor's hands, is difficult because the targets are heavily occluded during surgery. We use a recording system in which multiple cameras are embedded in the surgical lamp, and we assume that at least one camera is recording the target without occlusion at any given time. As the embedded cameras obtain multiple video sequences, we address the task of selecting the camera with the best view of the surgery. Unlike the conventional method, which selects the camera based on the area size of the surgery field, we propose a deep neural network that predicts the camera selection probability from multiple video sequences by learning the supervision of the expert annotation. We created a dataset in which six different types of plastic surgery are recorded, and we provided the annotation of camera switching. Our experiments show that our approach successfully switched between cameras and outperformed three baseline methods.
 翻译:外科手术记录是医学教育和治疗评估的重要任务。然而,在手术过程中,手术区域、手术工具或医生的手经常会受到严重遮挡,从而使得记录所需目标变得困难。我们使用的记录系统在手术灯中嵌入了多个相机,并假定在任何时候至少有一个相机可以无遮挡地记录所需目标。由于嵌入式相机可以获得多个视频序列,因此我们的任务是选择最佳视角的相机。与传统选择基于手术区域面积的方法不同,我们提出了一种深度神经网络,通过学习专家注释的监督来预测多个视频序列的相机选择概率。我们创建了一个数据集,其中记录了6种不同类型的整形手术,并提供了摄像头切换的注释。我们的实验表明,我们的方法成功地在相机之间切换,并优于三种基准方法。