真实数据丰富在组织病理学中鲁棒图像分割的应用 (Realistic Data Enrichment for Robust Image Segmentation in Histopathology)

Poor performance of quantitative analysis in histopathological Whole Slide Images (WSI) has been a significant obstacle in clinical practice. Annotating large-scale WSIs manually is a demanding and time-consuming task, unlikely to yield the expected results when used for fully supervised learning systems. Rarely observed disease patterns and large differences in object scales are difficult to model through conventional patient intake. Prior methods either fall back to direct disease classification, which only requires learning a few factors per image, or report on average image segmentation performance, which is highly biased towards majority observations. Geometric image augmentation is commonly used to improve robustness for average case predictions and to enrich limited datasets. So far no method provided sampling of a realistic posterior distribution to improve stability, e.g. for the segmentation of imbalanced objects within images. Therefore, we propose a new approach, based on diffusion models, which can enrich an imbalanced dataset with plausible examples from underrepresented groups by conditioning on segmentation maps. Our method can simply expand limited clinical datasets making them suitable to train machine learning pipelines, and provides an interpretable and human-controllable way of generating histopathology images that are indistinguishable from real ones to human experts. We validate our findings on two datasets, one from the public domain and one from a Kidney Transplant study.

翻译：---- 数量分析在组织病理学中的整个切片图像（WSI）中表现不佳，这是临床实践中的一个重要障碍。手动注释大规模WSI是一项艰巨而耗时的任务，不可能通过完全监督的学习系统产生预期结果。罕见的疾病模式和物体规模的巨大差异在传统的患者摄入中很难建模。先前的方法要么回退到直接疾病分类，这只需要学习每个图像的几个因素，要么报告平均图像分割性能，这高度偏向于多数观察结果。几何图像增强通常用于提高平均情况预测的稳健性和丰富有限的数据集。到目前为止，没有一种方法可以提供采样现实后验分布以提高稳定性，例如在图像内部不平衡目标的分割中。因此，我们提出了一种新的方法，基于扩散模型，它可以通过对分割映射进行条件化，从少数群体中抽取合理的例子，从而丰富不平衡的数据集。我们的方法可以简单地扩展有限的临床数据集，使它们适合训练机器学习管道，并提供一种可解释和人类可控的生成病理组织学图像的方法，这些图像对于人类专家来说无法区分真假。我们在两个数据集上验证了我们的发现，一个来自公共领域，一个来自肾移植研究。