Multi-modal imaging is a key healthcare technology that is often underutilized due to costs associated with multiple separate scans. This limitation yields the need for synthesis of unacquired modalities from the subset of available modalities. In recent years, generative adversarial network (GAN) models with superior depiction of structural details have been established as state-of-the-art in numerous medical image synthesis tasks. GANs are characteristically based on convolutional neural network (CNN) backbones that perform local processing with compact filters. This inductive bias in turn compromises learning of contextual features. Here, we propose a novel generative adversarial approach for medical image synthesis, ResViT, to combine local precision of convolution operators with contextual sensitivity of vision transformers. ResViT employs a central bottleneck comprising novel aggregated residual transformer (ART) blocks that synergistically combine convolutional and transformer modules. Comprehensive demonstrations are performed for synthesizing missing sequences in multi-contrast MRI, and CT images from MRI. Our results indicate superiority of ResViT against competing methods in terms of qualitative observations and quantitative metrics.
翻译:多式成像是一种关键的保健技术,由于多种不同扫描的相关费用而往往没有得到充分利用。这一限制使得有必要从现有模式的子集中合成未获得的模式。近年来,在众多医学图像合成任务中,基因对抗网络(GAN)模型和结构细节的高级描述被确定为最先进的医学图像合成任务。GAN具有典型的基于利用紧凑过滤器进行本地处理的共生神经网络主干网(CNN)的特征。这种诱导偏差反过来会影响对背景特征的学习。在这里,我们提议对医学图像合成采用新型的基因对抗方法,ResViT, 将本地变异操作器的精度与视觉变异器的背景敏感度结合起来。ResViT使用一个中央瓶颈,由新型的集成残留变异器(ART)块组成,以协同方式将变异变器和变异变器模块结合起来。对多调MRI的缺失序列和CT图像进行合成综合演示。我们的结果表明,ResViT优于定性观察和定量计量的相互竞争方法。