鲁棒半监督学习在数字组织切片图像中的应用——基于自我监督和异常样本排除的机制 (Robust Semi-Supervised Learning for Histopathology Images through Self-Supervision Guided Out-of-Distribution Scoring)

Semi-supervised learning (semi-SL) is a promising alternative to supervised learning for medical image analysis when obtaining good quality supervision for medical imaging is difficult. However, semi-SL assumes that the underlying distribution of unaudited data matches that of the few labeled samples, which is often violated in practical settings, particularly in medical images. The presence of out-of-distribution (OOD) samples in the unlabeled training pool of semi-SL is inevitable and can reduce the efficiency of the algorithm. Common preprocessing methods to filter out outlier samples may not be suitable for medical images that involve a wide range of anatomical structures and rare morphologies. In this paper, we propose a novel pipeline for addressing open-set supervised learning challenges in digital histology images. Our pipeline efficiently estimates an OOD score for each unlabelled data point based on self-supervised learning to calibrate the knowledge needed for a subsequent semi-SL framework. The outlier score derived from the OOD detector is used to modulate sample selection for the subsequent semi-SL stage, ensuring that samples conforming to the distribution of the few labeled samples are more frequently exposed to the subsequent semi-SL framework. Our framework is compatible with any semi-SL framework, and we base our experiments on the popular Mixmatch semi-SL framework. We conduct extensive studies on two digital pathology datasets, Kather colorectal histology dataset and a dataset derived from TCGA-BRCA whole slide images, and establish the effectiveness of our method by comparing with popular methods and frameworks in semi-SL algorithms through various experiments.

翻译：半监督学习是医学图像分析中的有希望的替代方法，特别是在难以获取高质量监督数据的场景下。然而，半监督学习假设未经审核的数据分布与少量标记样本的分布相匹配，这在实际场景中经常被违反，特别是在医学图像领域。未经审核的数据中存在超出分布范围的异常样本是不可避免的，这可能会降低算法效率。常见的预处理方法无法过滤掉具有多样解剖结构和罕见形态学的医学图像中的异常样本。本文提出了一种基于自我监督学习指导的异常样本排除机制，有效应对数字组织切片图像中的开放式监督学习挑战。我们的机制可以高效评估每个未标记数据点的溢出检测器得分来校准需要半监督框架紧接着的知识。从溢出检测器得出的异常样本得分用于调节样本选择，以确保符合少量标记样本分布的样本更频繁地暴露于随后的半监督学习框架中。我们的框架与任何半监督学习框架兼容，我们基于流行的Mixmatch半监督学习框架进行了实验研究。我们在两组数字病理学数据集上进行了广泛的研究，即Kather结直肠组织学数据集和从TCGA-BRCA全幻灯片图像衍生的数据集，并通过各种实验来比较我们的方法与半监督学习算法中的流行方法和框架的效果。