In the last decade, convolutional neural networks (ConvNets) have dominated and achieved state-of-the-art performances in a variety of medical imaging applications. However, the performances of ConvNets are still limited by lacking the understanding of long-range spatial relations in an image. The recently proposed Vision Transformer (ViT) for image classification uses a purely self-attention-based model that learns long-range spatial relations to focus on the relevant parts of an image. Nevertheless, ViT emphasizes the low-resolution features because of the consecutive downsamplings, result in a lack of detailed localization information, making it unsuitable for image registration. Recently, several ViT-based image segmentation methods have been combined with ConvNets to improve the recovery of detailed localization information. Inspired by them, we present ViT-V-Net, which bridges ViT and ConvNet to provide volumetric medical image registration. The experimental results presented here demonstrate that the proposed architecture achieves superior performance to several top-performing registration methods.
翻译:过去十年来,进化神经网络(Conval neal network)在各种医学成像应用中占主导地位并取得了最先进的性能,然而,ConvNet的性能仍然有限,因为对图像中的远程空间关系缺乏了解。最近为图像分类而提议的视觉变异器(ViT)使用一个纯粹以自我注意为基础的模型,该模型学习长距离空间关系,以关注图像的相关部分。然而,ViT强调由于连续的下行抽样而导致的低分辨率特征,导致缺乏详细的本地化信息,使其不适合图像登记。最近,一些基于ViT的图像分割方法与ConvNet相结合,以改善详细的本地化信息的恢复。根据这些方法,我们介绍了ViT-V-Net,它将V-V-Net连接V-ConvNet连接维特和ConvNet,以提供体积医学图像登记。这里的实验结果显示,拟议的结构取得了优异于几个业绩顶尖的登记方法。