Temporally aware image representations are crucial for capturing disease progression in 3D volumes of longitudinal medical datasets. However, recent state-of-the-art self-supervised learning approaches like Masked Autoencoding (MAE), despite their strong representation learning capabilities, lack temporal awareness. In this paper, we propose STAMP (Stochastic Temporal Autoencoder with Masked Pretraining), a Siamese MAE framework that encodes temporal information through a stochastic process by conditioning on the time difference between the 2 input volumes. Unlike deterministic Siamese approaches, which compare scans from different time points but fail to account for the inherent uncertainty in disease evolution, STAMP learns temporal dynamics stochastically by reframing the MAE reconstruction loss as a conditional variational inference objective. We evaluated STAMP on two OCT and one MRI datasets with multiple visits per patient. STAMP pretrained ViT models outperformed both existing temporal MAE methods and foundation models on different late stage Age-Related Macular Degeneration and Alzheimer's Disease progression prediction which require models to learn the underlying non-deterministic temporal dynamics of the diseases.
翻译:时序感知的图像表示对于捕捉纵向医学数据集中三维体积的疾病进展至关重要。然而,近期最先进的自监督学习方法,如掩码自编码(MAE),尽管具备强大的表示学习能力,却缺乏时序感知。本文提出STAMP(基于掩码预训练的随机时序自编码器),这是一种孪生MAE框架,通过以两个输入体积之间的时间差为条件,借助随机过程编码时序信息。与确定性孪生方法(该方法比较不同时间点的扫描图像,但未能考虑疾病演变中固有的不确定性)不同,STAMP通过将MAE重建损失重构为条件变分推断目标,以随机方式学习时序动态。我们在两个OCT数据集和一个MRI数据集(每位患者包含多次访视)上评估了STAMP。在不同晚期年龄相关性黄斑变性和阿尔茨海默病进展预测任务中,经STAMP预训练的ViT模型均优于现有的时序MAE方法和基础模型,这些任务要求模型学习疾病潜在的非确定性时序动态。