Pathology image analysis crucially relies on the availability and quality of annotated pathological samples, which are very difficult to collect and need lots of human effort. To address this issue, beyond traditional preprocess data augmentation methods, mixing-based approaches are effective and practical. However, previous mixing-based data augmentation methods do not thoroughly explore the essential characteristics of pathology images, including the local specificity, global distribution, and inner/outer-sample instance relationship. To further understand the pathology characteristics and make up effective pseudo samples, we propose the CellMix framework with a novel distribution-based in-place shuffle strategy. We split the images into patches with respect to the granularity of pathology instances and do the shuffle process across the same batch. In this way, we generate new samples while keeping the absolute relationship of pathology instances intact. Furthermore, to deal with the perturbations and distribution-based noise, we devise a loss-drive strategy inspired by curriculum learning during the training process, making the model fit the augmented data adaptively. It is worth mentioning that we are the first to explore data augmentation techniques in the pathology image field. Experiments show SOTA results on 7 different datasets. We conclude that this novel instance relationship-based strategy can shed light on general data augmentation for pathology image analysis. The code is available at https://github.com/sagizty/CellMix.
翻译:病理学图象分析至关重要地依赖于附加说明的病理学样本的可得性和质量,这些样本很难收集,需要大量人类的努力。为解决这一问题,除了传统的预处理数据增强方法外,混合法是有效和实用的。然而,以往基于混合的数据增强方法并不彻底探索病理图象的基本特征,包括地方特性、全球分布和内/外-沙姆实例关系。为了进一步理解病理特征,并形成有效的假样,我们提议CellMix框架,以新的基于地点的散装战略为基础。除了传统的预处理数据增强方法外,我们还将图像分成一些部分,对病理学案例的颗粒性进行分解,并在同一批中进行打乱。这样,我们产生新的样本,同时保持病理学案例的绝对关系完整。此外,为了处理扰动和基于内部/外-沙姆比的噪音,我们设计了一个在培训过程中学习课程时启发的失败驱动策略,使模型适应增强的数据适应性。值得一提的是,我们在病理学图象学图象场中首先探索数据增强技术。我们先探索了病理学模型,在病理学图象学图象学领域进行这种分析。实验性分析时,可以使SOTAT结果分析。在一般/图像分析上显示结果分析。我们可以进行。在一般分析。