Computational pathology can lead to saving human lives, but models are annotation hungry and pathology images are notoriously expensive to annotate. Self-supervised learning has shown to be an effective method for utilizing unlabeled data, and its application to pathology could greatly benefit its downstream tasks. Yet, there are no principled studies that compare SSL methods and discuss how to adapt them for pathology. To address this need, we execute the largest-scale study of SSL pre-training on pathology image data, to date. Our study is conducted using 4 representative SSL methods on diverse downstream tasks. We establish that large-scale domain-aligned pre-training in pathology consistently out-performs ImageNet pre-training in standard SSL settings such as linear and fine-tuning evaluations, as well as in low-label regimes. Moreover, we propose a set of domain-specific techniques that we experimentally show leads to a performance boost. Lastly, for the first time, we apply SSL to the challenging task of nuclei instance segmentation and show large and consistent performance improvements under diverse settings.
翻译:计算病理学可能会拯救人类生命,但是模型需要注释,而病理图像是成本昂贵的。自监督学习已经被证明是利用未标记数据的有效方法,它的应用于病理学可以极大地有益于下游任务。然而,没有任何原则性研究比较自监督学习方法,并讨论如何适应病理学。为了解决这个问题,我们进行了迄今为止规模最大的SSL预训练研究,使用4种代表性的SSL方法进行了不同下游任务的评估。我们发现,在病理学中,大规模的域对齐预训练在标准的SSL评估(如线性和微调评估)以及低标签区域中持续表现优于ImageNet预训练。此外,我们提出了一组针对领域的技术,实验显示它们可以提高性能。最后,我们首次将SSL应用于核实例分割这一具有挑战性的任务,并在不同设置下展示明显的性能提升。