Confocal laser endomicroscopy (CLE) is a non-invasive, real-time imaging modality that can be used for in-situ, in-vivo imaging and the microstructural analysis of mucous structures. The diagnosis using CLE is, however, complicated by images being hard to interpret for non-experienced physicians. Utilizing machine learning as an augmentative tool would hence be beneficial, but is complicated by the shortage of histopathology-correlated CLE imaging sequences with respect to the plurality of patterns in this domain, leading to overfitting of machine learning models. To overcome this, self-supervised learning (SSL) can be employed on larger unlabeled datasets. CLE is a video-based modality with high inter-frame correlation, leading to a non-stratified data distribution for SSL training. In this work, we propose a filter functionality on CLE video sequences to reduce the dataset redundancy in SSL training and improve SSL training convergence and training efficiency. We use four state-of-the-art baseline networks and a SSL teacher-student network with a vision transformer small backbone for the evaluation. These networks were evaluated on downstream tasks for a sinonasal tumor dataset and a squamous cell carcinoma of the skin dataset. On both datasets, we found the highest test accuracy on the filtered SSL-pretrained model, with 67.48% and 73.52%, both considerably outperforming their non-SSL baselines. Our results show that SSL is an effective method for CLE pretraining. Further, we show that our proposed CLE video filter can be utilized to improve training efficiency in self-supervised scenarios, resulting in a reduction of 67% in training time.
翻译:共聚焦激光内镜是一种非侵入性、实时成像技术,可用于黏膜结构的原位、在体成像及微观结构分析。然而,由于图像对非专业医师难以解读,基于CLE的诊断较为复杂。利用机器学习作为辅助工具具有潜在优势,但该领域内与组织病理学相关的CLE成像序列数量有限,且模式多样性不足,易导致机器学习模型过拟合。为解决此问题,可在更大规模的无标注数据集上采用自监督学习。CLE是基于视频的成像模式,帧间相关性较高,导致SSL训练数据分布非分层化。本研究提出一种针对CLE视频序列的过滤机制,旨在降低SSL训练中的数据集冗余,提升SSL训练收敛速度与训练效率。我们采用四种先进基线网络及基于视觉Transformer小型架构的SSL师生网络进行评估。这些网络在鼻窦肿瘤数据集和皮肤鳞状细胞癌数据集的下游任务中进行了测试。在两个数据集上,经过过滤的SSL预训练模型均取得最高测试准确率,分别为67.48%和73.52%,显著优于非SSL基线模型。结果表明SSL是有效的CLE预训练方法。此外,我们提出的CLE视频过滤器可提升自监督场景下的训练效率,训练时间减少67%。