Annotating cancerous regions in whole-slide images (WSIs) of pathology samples plays a critical role in clinical diagnosis, biomedical research, and machine learning algorithms development. However, generating exhaustive and accurate annotations is labor-intensive, challenging, and costly. Drawing only coarse and approximate annotations is a much easier task, less costly, and it alleviates pathologists' workload. In this paper, we study the problem of refining these approximate annotations in digital pathology to obtain more accurate ones. Some previous works have explored obtaining machine learning models from these inaccurate annotations, but few of them tackle the refinement problem where the mislabeled regions should be explicitly identified and corrected, and all of them require a - often very large - number of training samples. We present a method, named Label Cleaning Multiple Instance Learning (LC-MIL), to refine coarse annotations on a single WSI without the need of external training data. Patches cropped from a WSI with inaccurate labels are processed jointly with a MIL framework, and a deep-attention mechanism is leveraged to discriminate mislabeled instances, mitigating their impact on the predictive model and refining the segmentation. Our experiments on a heterogeneous WSI set with breast cancer lymph node metastasis, liver cancer, and colorectal cancer samples show that LC-MIL significantly refines the coarse annotations, outperforming the state-of-the-art alternatives, even while learning from a single slide. These results demonstrate the LC-MIL is a promising, lightweight tool to provide fine-grained annotations from coarsely annotated pathology sets.
翻译:在病理学全流图像中指出癌症地区,病理学样本的全滑动图象(SISI)在临床诊断、生物医学研究和机器学习算法开发中发挥着关键作用。然而,产生详尽和准确的说明是劳动密集型的、富有挑战性的和昂贵的。只绘制粗略和粗略的说明是一项容易得多的任务,费用较低,减轻病理学家的工作量。在本文中,我们研究了在数字病理学中改进这些近似说明的问题,以获得更准确的。以前的一些工作探索了从这些不准确的说明中获取机器学习模型的问题,但很少有人能够解决完善的问题,即应明确查明和纠正误标区域,而且所有这些区域都需要大量培训样品。我们提出了一个方法,名为Label清洁多例学习(LC-MIL),在不需要外部培训数据的情况下改进单一的WSI的粗略说明。从带有不准确标签的WSI的精细缩缩图解与深度观察机制一起处理,而深度观察机制被用来区分错误的事例,减轻其对预测模型的影响,甚至减轻其对预测模型的影响,并精细化癌症的分解分析结果,同时展示了癌症的精度研究。我们用了一种精化的精化的精化的精化的精制方法,在显示的精制的精制的精制的精制的精制的精制。