Inspired by the recent success of Transformers for Natural Language Processing and vision Transformer for Computer Vision, many researchers in the medical imaging community have flocked to Transformer-based networks for various main stream medical tasks such as classification, segmentation, and estimation. In this study, we analyze, two recently published Transformer-based network architectures for the task of multimodal head-and-tumor segmentation and compare their performance to the de facto standard 3D segmentation network - the nnU-Net. Our results showed that modeling long-range dependencies may be helpful in cases where large structures are present and/or large field of view is needed. However, for small structures such as head-and-neck tumor, the convolution-based U-Net architecture seemed to perform well, especially when training dataset is small and computational resource is limited.
翻译:医学成像界的许多研究人员都聚集在以变异器为基础的网络中,从事分类、分解和估计等各种主要的医疗任务。 在本研究中,我们分析了最近出版的两个基于变异器的网络结构,用于多式联运头部和图象分解的任务,并将其表现与事实上标准的3D分解网络(NNU-Net)进行比较。我们的结果显示,在大型结构存在和(或)大视野领域需要建模时,长距离依赖关系建模可能有所帮助。然而,对于头部和颈部肿瘤等小型结构,基于革命的UNet结构似乎运行良好,特别是当培训数据集规模小,计算资源有限时。