Current research on deep learning for medical image segmentation exposes their limitations in learning either global semantic information or local contextual information. To tackle these issues, a novel network named SegTransVAE is proposed in this paper. SegTransVAE is built upon encoder-decoder architecture, exploiting transformer with the variational autoencoder (VAE) branch to the network to reconstruct the input images jointly with segmentation. To the best of our knowledge, this is the first method combining the success of CNN, transformer, and VAE. Evaluation on various recently introduced datasets shows that SegTransVAE outperforms previous methods in Dice Score and $95\%$-Haudorff Distance while having comparable inference time to a simple CNN-based architecture network. The source code is available at: https://github.com/itruonghai/SegTransVAE.
翻译:目前关于医学图像分解的深层学习研究揭示了它们在学习全球语义信息或当地背景信息方面的局限性。为了解决这些问题,本文件建议建立一个名为SegTransVAE的新网络。Seg TransVAE建在编码器-代码器结构上,利用变压器和变异自动编码器(VAE)分支到网络上,与分解一起重建输入图像。据我们所知,这是将CNN、变压器和VAE成功结合起来的第一种方法。对最近推出的各种数据集的评估表明,SegTransVAE在比照一个简单的CNN结构网络的推论时间的同时,超越了Dice分数和95 $-Haudorff 距离的以往方法。源代码见:https://github.com/itruonghai/SegTransVAE。