Classical multiple instance learning (MIL) methods are often based on the identical and independent distributed assumption between instances, hence neglecting the potentially rich contextual information beyond individual entities. On the other hand, Transformers with global self-attention modules have been proposed to model the interdependencies among all instances. However, in this paper we question: Is global relation modeling using self-attention necessary, or can we appropriately restrict self-attention calculations to local regimes in large-scale whole slide images (WSIs)? We propose a general-purpose local attention graph-based Transformer for MIL (LA-MIL), introducing an inductive bias by explicitly contextualizing instances in adaptive local regimes of arbitrary size. Additionally, an efficiently adapted loss function enables our approach to learn expressive WSI embeddings for the joint analysis of multiple biomarkers. We demonstrate that LA-MIL achieves state-of-the-art results in mutation prediction for gastrointestinal cancer, outperforming existing models on important biomarkers such as microsatellite instability for colorectal cancer. Our findings suggest that local self-attention sufficiently models dependencies on par with global modules. Our LA-MIL implementation is available at https://github.com/agentdr1/LA_MIL.
翻译:经典多实例学习(MIL)方法往往基于不同实例之间相同和独立分布的假设,从而忽略了个别实体之外的潜在丰富背景信息。另一方面,提议采用全球自我关注模块的变异器来模拟各种情况之间的相互依存性。然而,在本文件中,我们提出疑问:使用自我注意进行全球关系建模是否必要,或者我们能否适当地将自我关注的计算限制在大规模整体幻灯片图像中的地方制度?我们建议为MIL(LA-MIL)提供一个通用的当地注意力图形化变异器(LA-MIL),在任意大小的适应性地方制度中引入明确的背景化实例,从而引入一种诱导偏差。此外,一个高效调整的损失功能使我们能够学习表达的WSI嵌入多种生物标志联合分析的方法。我们证明,LA-MI在大规模肠癌的突变预测中取得了最新结果,超过了诸如彩色癌症微型卫星不稳定性等重要生物标志的现有模型。我们的研究结果表明,当地自我保留足够模型的模型依赖于MAMI/L。我们在全球的模块中可以使用。