Cross-modality magnetic resonance (MR) image synthesis aims to produce missing modalities from existing ones. Currently, several methods based on deep neural networks have been developed using both source- and target-modalities in a supervised learning manner. However, it remains challenging to obtain a large amount of completely paired multi-modal training data, which inhibits the effectiveness of existing methods. In this paper, we propose a novel Self-supervised Learning-based Multi-scale Transformer Network (SLMT-Net) for cross-modality MR image synthesis, consisting of two stages, \ie, a pre-training stage and a fine-tuning stage. During the pre-training stage, we propose an Edge-preserving Masked AutoEncoder (Edge-MAE), which preserves the contextual and edge information by simultaneously conducting the image reconstruction and the edge generation. Besides, a patch-wise loss is proposed to treat the input patches differently regarding their reconstruction difficulty, by measuring the difference between the reconstructed image and the ground-truth. In this case, our Edge-MAE can fully leverage a large amount of unpaired multi-modal data to learn effective feature representations. During the fine-tuning stage, we present a Multi-scale Transformer U-Net (MT-UNet) to synthesize the target-modality images, in which a Dual-scale Selective Fusion (DSF) module is proposed to fully integrate multi-scale features extracted from the encoder of the pre-trained Edge-MAE. Moreover, we use the pre-trained encoder as a feature consistency module to measure the difference between high-level features of the synthesized image and the ground truth one. Experimental results show the effectiveness of the proposed SLMT-Net, and our model can reliably synthesize high-quality images when the training set is partially unpaired. Our code will be publicly available at https://github.com/lyhkevin/SLMT-Net.
翻译:跨模式磁共振(MR)图像合成旨在生成现有图像的缺失模式。 目前,基于深层神经网络的几种方法已经以监管的学习方式使用源与目标模式开发。 然而,获得大量完全配对的多模式培训数据仍具有挑战性,这抑制了现有方法的有效性。 在本文件中,我们提议为跨模式MR图像合成建立一个新型自监督的基于学习的多级变异器网络(SLMT-Net),由两个阶段组成,即\ie,培训前阶段和微调阶段。在培训前阶段,我们提议使用一个顶级保留正版自动变异图像(Edge-MAE),通过同时进行图像重组和边缘生成来保存背景和边缘信息。此外,我们提议采用一个可弥补性损失的方法,通过测量已重建的图像与地面结构之间的差异,我们提议的不升级的DVI-IM-IM-IM-IM-IM-IMD图像的升级,在当前的多级模型中,可以完全利用一个可升级的多级变的模型模块。