Self-supervised learning (SSL) opens up huge opportunities for medical image analysis that is well known for its lack of annotations. However, aggregating massive (unlabeled) 3D medical images like computerized tomography (CT) remains challenging due to its high imaging cost and privacy restrictions. In this paper, we advocate bringing a wealth of 2D images like chest X-rays as compensation for the lack of 3D data, aiming to build a universal medical self-supervised representation learning framework, called UniMiSS. The following problem is how to break the dimensionality barrier, \ie, making it possible to perform SSL with both 2D and 3D images? To achieve this, we design a pyramid U-like medical Transformer (MiT). It is composed of the switchable patch embedding (SPE) module and Transformers. The SPE module adaptively switches to either 2D or 3D patch embedding, depending on the input dimension. The embedded patches are converted into a sequence regardless of their original dimensions. The Transformers model the long-term dependencies in a sequence-to-sequence manner, thus enabling UniMiSS to learn representations from both 2D and 3D images. With the MiT as the backbone, we perform the UniMiSS in a self-distillation manner. We conduct expensive experiments on six 3D/2D medical image analysis tasks, including segmentation and classification. The results show that the proposed UniMiSS achieves promising performance on various downstream tasks, outperforming the ImageNet pre-training and other advanced SSL counterparts substantially. Code is available at \def\UrlFont{\rm\small\ttfamily} \url{https://github.com/YtongXie/UniMiSS-code}.
翻译:自监督学习(SSL)为医学图像分析开辟了巨大的机会,而医学图像分析因其缺乏说明而广为人知。然而,由于计算机化断层仪(CT)的成像成本和隐私限制,将大量(未贴标签的)3D医学图像如计算机化成三维成像(CT)仍然具有挑战性。在本文中,我们提倡将大量2D图像如胸X光作为3D数据缺乏的补偿,目的是构建一个通用的医疗自监督演示学习框架,称为UnigiotMISS。下面的问题是如何打破维度屏屏障(\i),以便能够用 2D 和 3D 图像执行SL 3D 流成像化(MIT) 。 SPE 模块根据输入维度调整开关到2D 或 3D 补丁嵌入。嵌入的补丁补丁正在转换成一个序列,而不管其最初的维度大小如何。 变形者们模拟其他的远端依赖者以序列到D 级的离子图像? 使UISSD 的演化过程能够从SD 进行自我分析。