Recently, masked image modeling (MIM) has gained considerable attention due to its capacity to learn from vast amounts of unlabeled data and has been demonstrated to be effective on a wide variety of vision tasks involving natural images. Meanwhile, the potential of self-supervised learning in modeling 3D medical images is anticipated to be immense due to the high quantities of unlabeled images, and the expense and difficulty of quality labels. However, MIM's applicability to medical images remains uncertain. In this paper, we demonstrate that masked image modeling approaches can also advance 3D medical images analysis in addition to natural images. We study how masked image modeling strategies leverage performance from the viewpoints of 3D medical image segmentation as a representative downstream task: i) when compared to naive contrastive learning, masked image modeling approaches accelerate the convergence of supervised training even faster (1.40$\times$) and ultimately produce a higher dice score; ii) predicting raw voxel values with a high masking ratio and a relatively smaller patch size is non-trivial self-supervised pretext-task for medical images modeling; iii) a lightweight decoder or projection head design for reconstruction is powerful for masked image modeling on 3D medical images which speeds up training and reduce cost; iv) finally, we also investigate the effectiveness of MIM methods under different practical scenarios where different image resolutions and labeled data ratios are applied.
翻译:最近,蒙面图像建模(MIM)因其从大量未贴标签的数据中学习的能力而得到相当大的关注,因为遮面图像建模(MIM)因其从大量未贴标签的数据中学习的能力而得到了相当大的关注,并且已证明在涉及自然图像的广泛愿景任务方面是有效的。与此同时,由于大量未贴标签的图像以及质量标签的成本和难度,在3D医学图像建模方面自我监督学习的潜力预计会很大。然而,蒙面图像建模(MIM)对医疗图像的适用性仍然不确定。在本文中,我们表明蒙面图像建模方法除了自然图像之外,还可以推进3D医学图像分析。我们研究蒙面图像建模战略如何利用3D医学图像分割作为代表下游任务的观点的绩效:(一)与天真对比学习相比,蒙面图像建模方法将加速监督培训的趋近(1.40美元,时间),最终得出更高的dice评分。 (二)以高掩面比率和相对较小的补码大小的原始的Voxel值,除了自然图像分析外,我们研究如何利用3D医学图像建模模型的自我监督的模型和模型的模型,最终将降低成本模型的模型的模型设计。