The interpretation of small tiles in large whole slide images (WSI) often needs a larger image context. We introduce TICON, a transformer-based tile representation contextualizer that produces rich, contextualized embeddings for ''any'' application in computational pathology. Standard tile encoder-based pipelines, which extract embeddings of tiles stripped from their context, fail to model the rich slide-level information essential for both local and global tasks. Furthermore, different tile-encoders excel at different downstream tasks. Therefore, a unified model is needed to contextualize embeddings derived from ''any'' tile-level foundation model. TICON addresses this need with a single, shared encoder, pretrained using a masked modeling objective to simultaneously unify and contextualize representations from diverse tile-level pathology foundation models. Our experiments demonstrate that TICON-contextualized embeddings significantly improve performance across many different tasks, establishing new state-of-the-art results on tile-level benchmarks (i.e., HEST-Bench, THUNDER, CATCH) and slide-level benchmarks (i.e., Patho-Bench). Finally, we pretrain an aggregator on TICON to form a slide-level foundation model, using only 11K WSIs, outperforming SoTA slide-level foundation models pretrained with up to 350K WSIs.
翻译:在大型全切片图像(WSI)中,小图块的解读通常需要更大的图像上下文信息。我们提出TICON,一种基于Transformer的图块表征上下文建模器,能够为计算病理学中的"任意"应用生成丰富的上下文化嵌入向量。基于标准图块编码器的流程(提取脱离上下文的图块嵌入)无法建模对局部和全局任务都至关重要的丰富切片级信息。此外,不同的图块编码器在不同下游任务中表现各异。因此,需要一种统一模型来对源自"任意"图块级基础模型的嵌入进行上下文建模。TICON通过单一共享编码器满足这一需求,该编码器采用掩码建模目标进行预训练,能够同时统一并上下文化来自不同图块级病理学基础模型的表征。实验表明,经TICON上下文化处理的嵌入向量在多种任务中显著提升性能,在图块级基准测试(如HEST-Bench、THUNDER、CATCH)和切片级基准测试(如Patho-Bench)上均创造了新的最先进结果。最后,我们在TICON基础上预训练聚合器构建切片级基础模型,仅使用11K张WSI即超越使用高达350K张WSI预训练的当前最优切片级基础模型。