用于组织病理学应用的与自监督学习相矛盾的自控学习的学习代表性 (Learning Representations with Contrastive Self-Supervised Learning for Histopathology Applications)

from arxiv, Accepted for publication at the Journal of Machine Learning for Biomedical Imaging (MELBA) https://www.melba-journal.org/papers/2022:023.html

Unsupervised learning has made substantial progress over the last few years, especially by means of contrastive self-supervised learning. The dominating dataset for benchmarking self-supervised learning has been ImageNet, for which recent methods are approaching the performance achieved by fully supervised training. The ImageNet dataset is however largely object-centric, and it is not clear yet what potential those methods have on widely different datasets and tasks that are not object-centric, such as in digital pathology. While self-supervised learning has started to be explored within this area with encouraging results, there is reason to look closer at how this setting differs from natural images and ImageNet. In this paper we make an in-depth analysis of contrastive learning for histopathology, pin-pointing how the contrastive objective will behave differently due to the characteristics of histopathology data. We bring forward a number of considerations, such as view generation for the contrastive objective and hyper-parameter tuning. In a large battery of experiments, we analyze how the downstream performance in tissue classification will be affected by these considerations. The results point to how contrastive learning can reduce the annotation effort within digital pathology, but that the specific dataset characteristics need to be considered. To take full advantage of the contrastive learning objective, different calibrations of view generation and hyper-parameters are required. Our results pave the way for realizing the full potential of self-supervised learning for histopathology applications.

翻译：在过去几年里,未经监督的学习取得了显著进展,特别是通过对比性自我监督的学习。自我监督学习基准化的主导数据集一直是图像网络,对于它来说,最近的方法正在接近通过充分监督的培训所实现的绩效。图像网络数据集虽然基本上以对象为中心,但还不清楚这些方法在非以目标为中心的广泛不同的数据集和任务(如数字病理学)方面有哪些潜力。虽然自监督学习已开始在该地区进行探索,并取得了令人鼓舞的结果,但有理由更仔细地研究这一设置与自然图像和图像网络的不同。在本文中,我们深入分析了对正向病理学的对比性学习,指出由于其病理学数据的特点,对比性的目标目标会如何不同。我们提出了一些考虑因素,例如对比性目标的生成和超度参数的调整。在一大批实验中,我们分析了组织分类的下游业绩将如何受到这些考虑因素的影响。我们从对比性研究的结果可以如何深刻地分析组织分类的深层次性能如何受到这些因素的影响,但从整个方向上看,我们如何进行对比性精确性的研究,从而缩小了数据生成的精确性,从而缩小了我们所研究的具体结果。