The development of deep segmentation models for computational pathology (CPath) can help foster the investigation of interpretable morphological biomarkers. Yet, there is a major bottleneck in the success of such approaches because supervised deep learning models require an abundance of accurately labelled data. This issue is exacerbated in the field of CPath because the generation of detailed annotations usually demands the input of a pathologist to be able to distinguish between different tissue constructs and nuclei. Manually labelling nuclei may not be a feasible approach for collecting large-scale annotated datasets, especially when a single image region can contain thousands of different cells. However, solely relying on automatic generation of annotations will limit the accuracy and reliability of ground truth. Therefore, to help overcome the above challenges, we propose a multi-stage annotation pipeline to enable the collection of large-scale datasets for histology image analysis, with pathologist-in-the-loop refinement steps. Using this pipeline, we generate the largest known nuclear instance segmentation and classification dataset, containing nearly half a million labelled nuclei in H&E stained colon tissue. We have released the dataset and encourage the research community to utilise it to drive forward the development of downstream cell-based models in CPath.
翻译:开发计算病理学(CPath)的深分解模型有助于推动对可解释的形态生物标志(CPath)的调查。然而,由于受监督的深层次学习模型需要大量准确标签的数据,这种方法的成功存在重大瓶颈,因为受监督的深层次学习模型需要大量准确标签的数据。在CPath领域,这一问题更加严重,因为生成详细的注释通常要求病理学家投入能够区分不同的组织构造和核心。人工标注核核也许不是收集大规模附加说明数据集的可行办法,特别是当一个图像区域能够包含数千个不同的细胞时。然而,仅仅依靠自动生成说明将限制地面真理的准确性和可靠性。因此,为了帮助克服上述挑战,我们建议一个多阶段注解管道,以便能够收集大规模的数据数据集,用于进行基因图象分析,并配有病理学家在环球中的改进步骤。我们利用这一管道,生成了已知的最大核实例分解和分类数据集,其中含有近50万个标注的H & E-E封闭的细胞驱动力研究模型,鼓励下游的C-Parequemal 数据库。