The advent of Vision Transformer (ViT) has brought substantial advancements in 3D volumetric benchmarks, particularly in 3D medical image segmentation. Concurrently, Multi-Layer Perceptron (MLP) networks have regained popularity among researchers due to their comparable results to ViT, albeit with the exclusion of the heavy self-attention module. This paper introduces a permutable hybrid network for volumetric medical image segmentation, named PHNet, which exploits the advantages of convolution neural network (CNN) and MLP. PHNet addresses the intrinsic isotropy problem of 3D volumetric data by utilizing both 2D and 3D CNN to extract local information. Besides, we propose an efficient Multi-Layer Permute Perceptron module, named MLPP, which enhances the original MLP by obtaining long-range dependence while retaining positional information. Extensive experimental results validate that PHNet outperforms the state-of-the-art methods on two public datasets, namely, COVID-19-20 and Synapse. Moreover, the ablation study demonstrates the effectiveness of PHNet in harnessing the strengths of both CNN and MLP. The code will be accessible to the public upon acceptance.
翻译:视觉转换器(ViT)的出现为三维体积基准测试带来了实质性的进展,特别是在三维医学图像分割方面。同时,多层感知器(MLP)网络由于其与ViT相当的结果而重新受到研究人员的青睐,尽管它们不包括沉重的自我注意模块。本文介绍了一种用于体积医学图像分割的可置换的混合网络,名为PHNet,它利用卷积神经网络(CNN)和MLP的优点。PHNet通过利用2D和3D CNN提取局部信息,解决了3D体积数据的固有等向性问题。此外,我们提出了一个有效的多层置换感知器模块,名为MLPP,它通过保留位置信息来增强原始MLP,获得长程依赖性。广泛的实验结果证实,PHNet在两个公共数据集COVID-19-20和Synapse上优于最先进的方法。此外,削除研究表明PHNet在利用CNN和MLP的优势方面是有效的。代码将在接受后向公众开放。