在无监督下对长囊内窥视录像进行时间切除的无监督的射击性散射边界探测 (Unsupervised Shot Boundary Detection for Temporal Segmentation of Long Capsule Endoscopy Videos)

Physicians use Capsule Endoscopy (CE) as a non-invasive and non-surgical procedure to examine the entire gastrointestinal (GI) tract for diseases and abnormalities. A single CE examination could last between 8 to 11 hours generating up to 80,000 frames which is compiled as a video. Physicians have to review and analyze the entire video to identify abnormalities or diseases before making diagnosis. This review task can be very tedious, time consuming and prone to error. While only as little as a single frame may capture useful content that is relevant to the physicians' final diagnosis, frames covering the small bowel region alone could be as much as 50,000. To minimize physicians' review time and effort, this paper proposes a novel unsupervised and computationally efficient temporal segmentation method to automatically partition long CE videos into a homogeneous and identifiable video segments. However, the search for temporal boundaries in a long video using high dimensional frame-feature matrix is computationally prohibitive and impracticable for real clinical application. Therefore, leveraging both spatial and temporal information in the video, we first extracted high level frame features using a pretrained CNN model and then projected the high-dimensional frame-feature matrix to lower 1-dimensional embedding. Using this 1-dimensional sequence embedding, we applied the Pruned Exact Linear Time (PELT) algorithm to searched for temporal boundaries that indicates the transition points from normal to abnormal frames and vice-versa. We experimented with multiple real patients' CE videos and our model achieved an AUC of 66\% on multiple test videos against expert provided labels.

翻译：医生使用Capsule Endoscop(CE)作为非侵入和非外科程序,检查整个肠胃肠道的疾病和异常情况。单一的CE检查可以持续8至11小时, 生成80,000个框架, 并汇编成视频。医生必须审查和分析整个视频, 在诊断前先辨别异常或疾病。这个审查任务可能非常乏味, 耗时, 容易出错。虽然仅仅一个框架可能很少捕捉到与医生最终诊断有关的有用内容, 仅涉及小肠肠区域的框架可能多达50,000个。为了尽量减少医生的审查时间和努力, 单一的CE检查可以持续8至11小时, 并且计算出高效的时间截断方法, 自动将长的CEE视频隔断成一个同质和可识别的视频段段。但是, 使用高维度框架的矩阵搜寻时间界限是难以计算到真实临床应用的。因此, 利用视频中的空间和时空信息, 我们首先用高级框架模型从一个高层次的模型, 模型从一个高端框架模型, 然后用高级框架模型进行高端框架模型,, 然后用我们使用高维的CVLLTLLE 。