Annotating cancerous regions in whole-slide images (WSIs) of pathology samples plays a critical role in clinical diagnosis, biomedical research, and machine learning algorithms development. However, generating exhaustive and accurate annotations is labor-intensive, challenging, and costly. Drawing only coarse and approximate annotations is a much easier task, less costly, and it alleviates pathologists' workload. In this paper, we study the problem of refining these approximate annotations in digital pathology to obtain more accurate ones. Some previous works have explored obtaining machine learning models from these inaccurate annotations, but few of them tackle the refinement problem where the mislabeled regions should be explicitly identified and corrected, and all of them require a -- often very large -- number of training samples. We present a method, named Label Cleaning Multiple Instance Learning (LC-MIL), to refine coarse annotations on a single WSI without the need of external training data. Patches cropped from a WSI with inaccurate labels are processed jointly within a multiple instance learning framework, mitigating their impact on the predictive model and refining the segmentation. Our experiments on a heterogeneous WSI set with breast cancer lymph node metastasis, liver cancer, and colorectal cancer samples show that LC-MIL significantly refines the coarse annotations, outperforming state-of-the-art alternatives, even while learning from a single slide. Moreover, we demonstrate how real annotations drawn by pathologists can be efficiently refined and improved by the proposed approach. All these results demonstrate that LC-MIL is a promising, light-weight tool to provide fine-grained annotations from coarsely annotated pathology sets.
翻译:在病理学的全滑动图象(SISI)中注明癌症地区,病理学样本的全滑动图象(SISI)指出癌症地区,这在临床诊断、生物医学研究和机器学习算法发展中起着关键作用。然而,产生详尽和准确的说明是劳动密集型的、富有挑战性的和昂贵的。只绘制粗略和粗略的说明是一个容易得多的任务,成本较低,减轻了病理学家的工作量。在本文中,我们研究了在数字病理学中改进这些近似说明的问题,以获得更准确的数据。以前的一些工作探索了从这些不准确的说明中获取机器学习模型,但其中很少有人会处理完善的问题,因为错误标签区域应该被明确识别和纠正,而所有这些区域都需要大量 -- -- 往往是 -- -- 大量的 -- 培训样本。我们提出了一个方法,名为Label C清洁多例研究(LC-MIL),在不需要外部培训数据的情况下改进单一的WSI的粗略说明。从具有不准确标签的WSI的精细的精细的精细的图解图解图解图解,通过不断的精细图解的精细的皮肤,我们从一个不细的图解的图解的图解的图解的图解的图解的图解的图解的图解的精细地展示了一层的皮肤的皮肤的精细细化了癌症的精细。