NCI 成像数据共享数据作为计算病理学的可复制研究平台</s> (The NCI Imaging Data Commons as a platform for reproducible research in computational pathology)

Daniela P. Schacherer,Markus D. Herrmann,David A. Clunie,Henning Höfener,William Clifford,William J. R. Longabaugh,Steve Pieper,Ron Kikinis,Andrey Fedorov,André Homeyer

Objective: Reproducibility is critical for translating machine learning-based (ML) solutions in computational pathology (CompPath) into practice. However, an increasing number of studies report difficulties in reproducing ML results. The NCI Imaging Data Commons (IDC) is a public repository of >120 cancer image collections, including >38,000 whole-slide images (WSIs), that is designed to be used with cloud-based ML services. Here, we explore the potential of the IDC to facilitate reproducibility of CompPath research. Materials and Methods: The IDC realizes the FAIR principles: All images are encoded according to the DICOM standard, persistently identified, discoverable via rich metadata, and accessible via open tools. Taking advantage of this, we implemented two experiments in which a representative ML-based method for classifying lung tumor tissue was trained and/or evaluated on different datasets from the IDC. To assess reproducibility, the experiments were run multiple times with independent but identically configured sessions of common ML services. Results: The AUC values of different runs of the same experiment were generally consistent and in the same order of magnitude as a similar, previously published study. However, there were occasional small variations in AUC values of up to 0.044, indicating a practical limit to reproducibility. Discussion and conclusion: By realizing the FAIR principles, the IDC enables other researchers to reuse exactly the same datasets. Cloud-based ML services enable others to run CompPath experiments in an identically configured computing environment without having to own high-performance hardware. The combination of both makes it possible to approach the reproducibility limit.

翻译：目标:重现对于将计算病理学(Comppath)中基于机器学习(ML)的解决方案转化为实践至关重要。然而,越来越多的研究报告称在复制 ML 结果方面遇到困难。 NCI 成像数据共享(IDC)是一个公共储存库,由120种癌症图像收集(包括>38 000个全流图像)组成,设计用于云基 ML 服务。在这里,我们探索IDC的潜力,以促进Compath 研究(Compath)中基于机器学习(ML)的解决方案的再复制。材料和方法:IDC实现了FAIR原则:所有图像都按照DICOM标准、持续识别、通过丰富的元数据发现并通过开放工具获取而对其它图像进行编码。我们利用了两个实验,对基于肺肿瘤组织分类的具有代表性的 ML 方法进行了培训和/或用基于云基 ML 服务的不同数据集进行评估。为了评估可复制性,实验是多次运行的,与基于共同 ML 服务的独立的、但结构相同的计算过程:所有图像都按照DICOM 标准、持续确认、通过丰富的元数据发现、不同的实验过程的数值,但通常都以相同程度的变异的数值也显示环境。</s>

相关内容

IDC

关注 6

Interaction Design and Children是研究人员、教育工作者和实践者的首次国际会议，旨在分享包容性儿童中心设计、学习和互动领域的最新研究成果、创新方法和新技术。年会包括论文、专题介绍、发言者、讲习班、参与性设计经验以及讨论如何为儿童创造更好的互动经验。官网链接：http://idc.acm.org/2019/

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日