One of the biggest challenges for applying machine learning to histopathology is weak supervision: whole-slide images have billions of pixels yet often only one global label. The state of the art therefore relies on strongly-supervised model training using additional local annotations from domain experts. However, in the absence of detailed annotations, most weakly-supervised approaches depend on a frozen feature extractor pre-trained on ImageNet. We identify this as a key weakness and propose to train an in-domain feature extractor on histology images using MoCo v2, a recent self-supervised learning algorithm. Experimental results on Camelyon16 and TCGA show that the proposed extractor greatly outperforms its ImageNet counterpart. In particular, our results improve the weakly-supervised state of the art on Camelyon16 from 91.4% to 98.7% AUC, thereby closing the gap with strongly-supervised models that reach 99.3% AUC. Through these experiments, we demonstrate that feature extractors trained via self-supervised learning can act as drop-in replacements to significantly improve existing machine learning techniques in histology. Lastly, we show that the learned embedding space exhibits biologically meaningful separation of tissue structures.
翻译:将机器学习应用于病理学的最大挑战之一是监管不力:整流图像有数十亿像素,但往往只有一个全球标签。 因此,艺术状态依赖于使用域专家的额外本地说明进行高度监督的模型培训。 然而,由于缺乏详细说明,大多数薄弱监督的方法都依赖于在图像网上预先培训的冷冻地物提取器。 我们将此确定为关键弱点,并提议用Moco v2来培训一个内部地貌提取器,用MoCo v2来培训一个内部地貌特征提取器,这是最近的自我监督的学习算法。 Camelyon16 和TCGA的实验结果显示,提议的提取器大大优于其图像网络对应方。 特别是,我们的成果改善了16 超软性地监控状态的艺术状态,从91.4%到98.7% ACUC,从而缩小了与高度控制的模型的差距,这些模型达到99.3% ACUC。 通过这些实验,我们证明通过自我监督的学习所训练的地貌提取器可以取代现有生物组织结构结构。 最后,我们学习了有意义地显示,我们学会了生物组织结构结构结构。