Recently, many attempts have been made to construct a transformer base U-shaped architecture, and new methods have been proposed that outperformed CNN-based rivals. However, serious problems such as blockiness and cropped edges in predicted masks remain because of transformers' patch partitioning operations. In this work, we propose a new U-shaped architecture for medical image segmentation with the help of the newly introduced focal modulation mechanism. The proposed architecture has asymmetric depths for the encoder and decoder. Due to the ability of the focal module to aggregate local and global features, our model could simultaneously benefit the wide receptive field of transformers and local viewing of CNNs. This helps the proposed method balance the local and global feature usage to outperform one of the most powerful transformer-based U-shaped models called Swin-UNet. We achieved a 1.68% higher DICE score and a 0.89 better HD metric on the Synapse dataset. Also, with extremely limited data, we had a 4.25% higher DICE score on the NeoPolyp dataset. Our implementations are available at: https://github.com/givkashi/Focal-UNet
翻译:最近,人们多次试图建造变压器基U形结构,并提出了比CNN的对手更优秀的新办法。然而,由于变压器的补丁分割操作,预测面罩中的阻塞和作物边缘等严重问题依然存在。在这项工作中,我们提议在新引入的焦点调制机制的帮助下,为医疗图像分化建立一个新的U形结构。拟议的结构对编码器和解码器有不对称的深度。由于核心模块能够综合当地和全球的特征,我们的模式可以同时使广受欢迎的变压器领域和CNN的本地查看同时受益。这帮助拟议的方法平衡了本地和全球特性的使用,从而超越了最强大的以变压器为基础的称为Swin-UNet的U型模型之一。我们实现了1.68%的DICE分数和0.89更好的Syapse数据集的HD分数。此外,由于数据非常有限,我们NeoPolyp数据集上的DICE分数为4.25%。我们的实施情况可以在以下网址上查到: https://github.fashal/givoc。