Progress in digital pathology is hindered by high-resolution images and the prohibitive cost of exhaustive localized annotations. The commonly used paradigm to categorize pathology images is patch-based processing, which often incorporates multiple instance learning (MIL) to aggregate local patch-level representations yielding image-level prediction. Nonetheless, diagnostically relevant regions may only take a small fraction of the whole tissue, and current MIL-based approaches often process images uniformly, discarding the inter-patches interactions. To alleviate these issues, we propose ScoreNet, a new efficient transformer that exploits a differentiable recommendation stage to extract discriminative image regions and dedicate computational resources accordingly. The proposed transformer leverages the local and global attention of a few dynamically recommended high-resolution regions at an efficient computational cost. We further introduce a novel mixing data-augmentation, namely ScoreMix, by leveraging the image's semantic distribution to guide the data mixing and produce coherent sample-label pairs. ScoreMix is embarrassingly simple and mitigates the pitfalls of previous augmentations, which assume a uniform semantic distribution and risk mislabeling the samples. Thorough experiments and ablation studies on three breast cancer histology datasets of Haematoxylin & Eosin (H&E) have validated the superiority of our approach over prior arts, including transformer-based models on tumour regions-of-interest (TRoIs) classification. ScoreNet equipped with proposed ScoreMix augmentation demonstrates better generalization capabilities and achieves new state-of-the-art (SOTA) results with only 50% of the data compared to other mixing augmentation variants. Finally, ScoreNet yields high efficacy and outperforms SOTA efficient transformers, namely TransPath and SwinTransformer.
翻译:数字病理学的进展受到高分辨率图像和详尽本地化注释成本过高的阻碍。 用于对病理学图像进行分类的常用模式是基于补丁的处理,这往往包括多例实例学习(MIL),以汇总局部补丁级表示,得出图像水平的预测。 尽管如此,诊断相关的区域可能只占整个组织一小部分,而目前的基于MIL的方法往往统一处理图像,抛弃了各行之间的相互作用。为了缓解这些问题,我们提议CormNet,这是一个新的高效变压器,它利用一个不同的建议阶段来提取有区别的图像区域,并相应地投入计算资源。提议的变压器往往利用多例学习(MIL)来综合当地和全球对一些动态推荐的高分辨率区域的关注,以高效的计算成本进行预测。 我们还引入了一个新的混合数据提示,即ConclockMix, 利用图像的语系分布来指导数据的混合,并产生一致的样本配对比。 ScordMix基础的变压器非常简单,并且减轻了先前国家递增量分类的陷阱,这需要三个统一的精度分布和风险, 将样本的血压变压的DNA变压和变压数据比前的变压。