The distribution and appearance of nuclei are essential markers for the diagnosis and study of cancer. Despite the importance of nuclear morphology, there is a lack of large scale, accurate, publicly accessible nucleus segmentation data. To address this, we developed an analysis pipeline that segments nuclei in whole slide tissue images from multiple cancer types with a quality control process. We have generated nucleus segmentation results in 5,060 Whole Slide Tissue images from 10 cancer types in The Cancer Genome Atlas. One key component of our work is that we carried out a multi-level quality control process (WSI-level and image patch-level), to evaluate the quality of our segmentation results. The image patch-level quality control used manual segmentation ground truth data from 1,356 sampled image patches. The datasets we publish in this work consist of roughly 5 billion quality controlled nuclei from more than 5,060 TCGA WSIs from 10 different TCGA cancer types and 1,356 manually segmented TCGA image patches from the same 10 cancer types plus additional 4 cancer types. Data is available at https://doi.org/10.7937/tcia.2019.4a4dkp9u
翻译:核细胞的分布和外观是癌症诊断和研究的基本标志。尽管核形态学很重要,但缺乏大规模、准确、可公开获取的核分离数据。为了解决这个问题,我们开发了一个分析管道,通过质量控制程序,将多种癌症类型的整个幻灯片组织图象中的核部分分解成一个完整的幻灯片组织图象。我们已经在癌症基因组Atlas中产生了来自10种癌症类型的5 060个整体幻灯片组织图象的核分离结果。我们工作的一个关键组成部分是,我们开展了一个多层次的质量控制进程(SWSI级和图像补补补级),以评价我们分解结果的质量。图像补丁级质量控制使用了来自1 356个抽样图像补版的人工分解地面数据。我们在这个工作中公布的数据集包括大约50亿个质量控制核心,来自10种不同的TCGA癌症类型和1 356个手动分解的TCGA图像补合体,以及另外4种癌症类型。数据见https://doi.org/109.37/cia2019)