Objective: Reproducibility is critical for translating machine learning-based (ML) solutions in computational pathology (CompPath) into practice. However, an increasing number of studies report difficulties in reproducing ML results. The NCI Imaging Data Commons (IDC) is a public repository of >120 cancer image collections, including >38,000 whole-slide images (WSIs), that is designed to be used with cloud-based ML services. Here, we explore the potential of the IDC to facilitate reproducibility of CompPath research. Materials and Methods: The IDC realizes the FAIR principles: All images are encoded according to the DICOM standard, persistently identified, discoverable via rich metadata, and accessible via open tools. Taking advantage of this, we implemented two experiments in which a representative ML-based method for classifying lung tumor tissue was trained and/or evaluated on different datasets from the IDC. To assess reproducibility, the experiments were run multiple times with independent but identically configured sessions of common ML services. Results: The AUC values of different runs of the same experiment were generally consistent and in the same order of magnitude as a similar, previously published study. However, there were occasional small variations in AUC values of up to 0.044, indicating a practical limit to reproducibility. Discussion and conclusion: By realizing the FAIR principles, the IDC enables other researchers to reuse exactly the same datasets. Cloud-based ML services enable others to run CompPath experiments in an identically configured computing environment without having to own high-performance hardware. The combination of both makes it possible to approach the reproducibility limit.
翻译:目标:重现对于将计算病理学(Comppath)中基于机器学习(ML)的解决方案转化为实践至关重要。然而,越来越多的研究报告称在复制 ML 结果方面遇到困难。 NCI 成像数据共享(IDC)是一个公共储存库,由120种癌症图像收集(包括>38 000个全流图像)组成,设计用于云基 ML 服务。在这里,我们探索IDC的潜力,以促进Compath 研究(Compath)中基于机器学习(ML)的解决方案的再复制。材料和方法:IDC实现了FAIR原则:所有图像都按照DICOM标准、持续识别、通过丰富的元数据发现并通过开放工具获取而对其它图像进行编码。我们利用了两个实验,对基于肺肿瘤组织分类的具有代表性的 ML 方法进行了培训和/或用基于云基 ML 服务的不同数据集进行评估。为了评估可复制性,实验是多次运行的,与基于共同 ML 服务的独立的、但结构相同的计算过程:所有图像都按照DICOM 标准、持续确认、通过丰富的元数据发现、 不同的实验过程的数值,但通常都以相同程度的变异的数值也显示环境。</s>