内窥镜图像和视频本地化的西亚网络特征 (Siamese Network Features for Endoscopy Image and Video Localization)

Conventional Endoscopy (CE) and Wireless Capsule Endoscopy (WCE) are known tools for diagnosing gastrointestinal (GI) tract disorders. Localizing frames provide valuable information about the anomaly location and also can help clinicians determine a more appropriate treatment plan. There are many automated algorithms to detect the anomaly. However, very few of the existing works address the issue of localization. In this study, we present a combination of meta-learning and deep learning for localizing both endoscopy images and video. A dataset is collected from 10 different anatomical positions of human GI tract. In the meta-learning section, the system was trained using 78 CE and 27 WCE annotated frames with a modified Siamese Neural Network (SNN) to predict the location of one single image/frame. Then, a postprocessing section using bidirectional long short-term memory is proposed for localizing a sequence of frames. Here, we have employed feature vector, distance and predicted location obtained from a trained SNN. The postprocessing section is trained and tested on 1,028 and 365 seconds of CE and WCE videos using hold-out validation (50%), and achieved F1-score of 86.3% and 83.0%, respectively. In addition, we performed subjective evaluation using nine gastroenterologists. The results show that the computer-aided methods can outperform gastroenterologists assessment of localization. The proposed method is compared with various approaches, such as support vector machine with hand-crafted features, convolutional neural network and the transfer learning-based methods, and showed better results. Therefore, it can be used in frame localization, which can help in video summarization and anomaly detection.

翻译：常规内窥镜(CE)和无线胶囊内窥镜(WCE)是用于诊断胃肠道紊乱的已知工具。本地化框架为异常点提供了宝贵的信息,也能够帮助临床医生确定更合适的治疗计划。有许多自动算法来检测异常点。然而,现有的工作很少几个涉及本地化问题。在本研究中,我们介绍了将内镜图像和视频本地化的元学习和深学习组合。从人类GI原的10个不同的解剖位置收集了一个数据集。在元化学习部分,该系统使用78个CE和27个WCE的附加图示框架进行了培训,并有一个修改后的SNNE神经网络(SNN)来预测一个图像/框架的更适当位置。然后,提议了一个使用双向长短期内记忆的后处理部分,用于本地化的图像和视频序列。在这里,我们使用了从经过培训的SNNNE的特性矢量、远程和预测位置。后处理部分在1 028和365秒内,对CE和WCRO的直路路路路路路段进行了测试结果进行了测试,在测试后,用我们用直径的直径法进行了实地化方法进行了测试。