Timely and accurate lymphoma diagnosis is essential for guiding cancer treatment. Standard diagnostic practice combines hematoxylin and eosin (HE)-stained whole slide images with immunohistochemistry, flow cytometry, and molecular genetic tests to determine lymphoma subtypes, a process requiring costly equipment, skilled personnel, and causing treatment delays. Deep learning methods could assist pathologists by extracting diagnostic information from routinely available HE-stained slides, yet comprehensive benchmarks for lymphoma subtyping on multicenter data are lacking. In this work, we present the first multicenter lymphoma benchmarking dataset covering four common lymphoma subtypes and healthy control tissue. We systematically evaluate five publicly available pathology foundation models (H-optimus-1, H0-mini, Virchow2, UNI2, Titan) combined with attention-based (AB-MIL) and transformer-based (TransMIL) multiple instance learning aggregators across three magnifications (10x, 20x, 40x). On in-distribution test sets, models achieve multiclass balanced accuracies exceeding 80% across all magnifications, with all foundation models performing similarly and both aggregation methods showing comparable results. The magnification study reveals that 40x resolution is sufficient, with no performance gains from higher resolutions or cross-magnification aggregation. However, on out-of-distribution test sets, performance drops substantially to around 60%, highlighting significant generalization challenges. To advance the field, larger multicenter studies covering additional rare lymphoma subtypes are needed. We provide an automated benchmarking pipeline to facilitate such future research.


翻译:及时准确的淋巴瘤诊断对于指导癌症治疗至关重要。标准诊断实践结合苏木精-伊红(HE)染色全切片图像与免疫组织化学、流式细胞术及分子遗传学检测来确定淋巴瘤亚型,这一过程需要昂贵设备、专业人员并导致治疗延迟。深度学习方法可通过从常规可得的HE染色切片中提取诊断信息来辅助病理学家,但目前尚缺乏基于多中心数据的淋巴瘤亚型分类综合基准。本研究首次构建了涵盖四种常见淋巴瘤亚型与健康对照组织的多中心淋巴瘤基准数据集。我们系统评估了五种公开可用的病理学基础模型(H-optimus-1、H0-mini、Virchow2、UNI2、Titan)与基于注意力机制(AB-MIL)和基于Transformer架构(TransMIL)的多实例学习聚合器在三种放大倍数(10倍、20倍、40倍)下的性能。在分布内测试集上,所有模型在各级放大倍数下均实现超过80%的多类平衡准确率,各基础模型表现相近,两种聚合方法结果相当。放大倍数研究表明40倍分辨率已足够,更高分辨率或跨放大倍数聚合未带来性能提升。然而在分布外测试集上,性能显著下降至约60%,凸显出严峻的泛化挑战。为推进该领域发展,需要覆盖更多罕见淋巴瘤亚型的大规模多中心研究。我们提供了自动化基准测试流程以促进未来相关研究。

0
下载
关闭预览

相关内容

国家自然科学基金
0+阅读 · 2016年12月31日
国家自然科学基金
0+阅读 · 2015年12月31日
国家自然科学基金
3+阅读 · 2015年12月31日
VIP会员
相关基金
国家自然科学基金
0+阅读 · 2016年12月31日
国家自然科学基金
0+阅读 · 2015年12月31日
国家自然科学基金
3+阅读 · 2015年12月31日
Top
微信扫码咨询专知VIP会员