Multi-modal imaging is a key healthcare technology in the diagnosis and management of disease, but it is often underutilized due to costs associated with multiple separate scans. This limitation yields the need for synthesis of unacquired modalities from the subset of available modalities. In recent years, generative adversarial network (GAN) models with superior depiction of structural details have been established as state-of-the-art in numerous medical image synthesis tasks. However, GANs are characteristically based on convolutional neural network (CNN) backbones that perform local processing with compact filters. This inductive bias, in turn, compromises learning of long-range spatial dependencies. While attention maps incorporated in GANs can multiplicatively modulate CNN features to emphasize critical image regions, their capture of global context is mostly implicit. Here, we propose a novel generative adversarial approach for medical image synthesis, ResViT, to combine local precision of convolution operators with contextual sensitivity of vision transformers. Based on an encoder-decoder architecture, ResViT employs a central bottleneck comprising novel aggregated residual transformer (ART) blocks that synergistically combine convolutional and transformer modules. Comprehensive demonstrations are performed for synthesizing missing sequences in multi-contrast MRI and CT images from MRI. Our results indicate the superiority of ResViT against competing methods in terms of qualitative observations and quantitative metrics.
翻译:多式成像是诊断和管理疾病的关键保健技术,但往往由于多种不同扫描的相关成本而得不到充分利用。这一限制导致需要从现有模式子组中合成未获得的模式。近些年来,基因对抗网络(GAN)模型和结构细节的高级描述被确定为许多医学图像合成任务中最先进的结构细节。然而,GAN的特点是基于以压缩过滤器进行本地处理的动态神经网络主干网(CNN),这种诱导偏差反过来会影响远距离空间依赖性学习。GANs中所含的注意地图可以重复地调整CNN功能,以强调关键图像区域,但它们对全球背景的捕捉大多是隐含的。在这里,我们提出了一种新的基因对抗对抗性对抗性对抗性对抗性对抗性对抗性对抗性对抗性对抗性对抗性对抗性对抗性对抗性对抗性对抗性思维结构。 ResViT使用一种对本地变压器的精度和视觉变压器的背景敏感度结构,而ResViCT则使用一种由新型的近似性变压变压模型组成的中央瓶颈,用于我们变压式变压式变压式变压式变压式模型。