Progress in digital pathology is hindered by high-resolution images and the prohibitive cost of exhaustive localized annotations. The commonly used paradigm to categorize pathology images is patch-based processing, which often incorporates multiple instance learning (MIL) to aggregate local patch-level representations yielding image-level prediction. Nonetheless, diagnostically relevant regions may only take a small fraction of the whole tissue, and MIL-based aggregation operation assumes that all patch representations are independent and thus mislays the contextual information from adjacent cell and tissue microenvironments. Consequently, the computational resources dedicated to a specific region are independent of its information contribution. This paper proposes a transformer-based architecture specifically tailored for histopathological image classification, which combines fine-grained local attention with a coarse global attention mechanism to learn meaningful representations of high-resolution images at an efficient computational cost. More importantly, based on the observation above, we propose a novel mixing-based data-augmentation strategy, namely ScoreMix, by leveraging the distribution of the semantic regions of images during the training and carefully guiding the data mixing via sampling the locations of discriminative image content. Thorough experiments and ablation studies on three challenging representative cohorts of Haematoxylin & Eosin (H&E) tumour regions-of-interest (TRoIs) datasets have validated the superiority of our approach over existing state-of-the-art methods and effectiveness of our proposed components, e.g., data augmentation in improving classification performance. We also demonstrate our method's interpretability, robustness, and cross-domain generalization capability.
翻译:数字病理学的进展受到高分辨率图像和详尽无遗的局部说明成本的阻碍。对病理学图像进行分类的常用模式是基于补丁的处理,这往往包括多实例学习(MIL),以汇总地方补丁代表制,得出图像水平的预测。然而,诊断相关区域可能只占整个组织中一小部分,而基于MIL的汇总作业假定所有补丁表示都是独立的,从而误用邻近细胞和组织微观环境的背景资料。因此,专用于特定区域的计算资源是独立于其信息贡献的。本文提议了一种基于变压器的架构,专门为宗教病理图像分类专门设计,将精细的当地关注与粗略的全球关注机制结合起来,以便以高效的计算成本来学习高分辨率图像的有意义的表述。更重要的是,根据上述观察,我们提出了一个新的混合数据推算战略,即SclockMix。在培训期间利用图像的精度区域分布,并通过对具有分析性的图像内容的选取性选址位置来仔细指导数据的混合。(Horough exboria exal exal eximalalalalalalalalalalal-exalalation) 和我们提出的三个区域的数据解释方法。我们关于提高-toimaltistrisal-toal-toal-toal-toal-toalalisal