The lack of well-annotated datasets in computational pathology (CPath) obstructs the application of deep learning techniques for classifying medical images. %Since pathologist time is expensive, dataset curation is intrinsically difficult. Many CPath workflows involve transferring learned knowledge between various image domains through transfer learning. Currently, most transfer learning research follows a model-centric approach, tuning network parameters to improve transfer results over few datasets. In this paper, we take a data-centric approach to the transfer learning problem and examine the existence of generalizable knowledge between histopathological datasets. First, we create a standardization workflow for aggregating existing histopathological data. We then measure inter-domain knowledge by training ResNet18 models across multiple histopathological datasets, and cross-transferring between them to determine the quantity and quality of innate shared knowledge. Additionally, we use weight distillation to share knowledge between models without additional training. We find that hard to learn, multi-class datasets benefit most from pretraining, and a two stage learning framework incorporating a large source domain such as ImageNet allows for better utilization of smaller datasets. Furthermore, we find that weight distillation enables models trained on purely histopathological features to outperform models using external natural image data.
翻译:计算病理学(CPath)中缺乏有良好注释的数据集,阻碍了医疗图像分类的深层次学习技术的应用。%Sinpatholologist time is 昂贵,而数据集曲线则具有内在的困难。许多 CPath工作流程涉及通过传输学习在不同图像领域之间转让所学知识。目前,大多数传输学习研究都遵循以模型为中心的方法,调整网络参数,以改进对少数数据集的传输结果。在本文中,我们对转移学习问题采取以数据为中心的方法,并研究在组织病理数据集之间是否存在可普遍应用的知识。首先,我们为汇总现有的病理学数据创建了标准化工作流程。我们随后通过培训ResNet18模型,通过多个病理数据集,以及交叉传输,在不同的图像领域之间,衡量内部知识。此外,我们使用权重蒸馏方法在模型之间分享知识,而无需额外培训。我们发现,学习困难、多级数据集最能获益于前期训练,以及两个阶段学习框架,包括一个大源域,例如图像网络能够更好地利用外部数据模型。