SwinVFTR：一种新型的用于3D OCT液体分割的体积特征学习Transformer (SwinVFTR: A Novel Volumetric Feature-learning Transformer for 3D OCT Fluid Segmentation)

Accurately segmenting fluid in 3D volumetric optical coherence tomography (OCT) images is a crucial yet challenging task for detecting eye diseases. Traditional autoencoding-based segmentation approaches have limitations in extracting fluid regions due to successive resolution loss in the encoding phase and the inability to recover lost information in the decoding phase. Although current transformer-based models for medical image segmentation addresses this limitation, they are not designed to be applied out-of-the-box for 3D OCT volumes, which have a wide-ranging channel-axis size based on different vendor device and extraction technique. To address these issues, we propose SwinVFTR, a new transformer-based architecture designed for precise fluid segmentation in 3D volumetric OCT images. We first utilize a channel-wise volumetric sampling for training on OCT volumes with varying depths (B-scans). Next, the model uses a novel shifted window transformer block in the encoder to achieve better localization and segmentation of fluid regions. Additionally, we propose a new volumetric attention block for spatial and depth-wise attention, which improves upon traditional residual skip connections. Consequently, utilizing multi-class dice loss, the proposed architecture outperforms other existing architectures on the three publicly available vendor-specific OCT datasets, namely Spectralis, Cirrus, and Topcon, with mean dice scores of 0.72, 0.59, and 0.68, respectively. Additionally, SwinVFTR outperforms other architectures in two additional relevant metrics, mean intersection-over-union (Mean-IOU) and structural similarity measure (SSIM).

翻译：精确地分割3D体积光学相干断层扫描（OCT）图像中的液体是检测眼部疾病的关键且具有挑战性的任务。由于编码阶段具有连续的分辨率损失并且不能恢复丢失的信息，传统的基于自动编码的分割方法在提取液体区域方面存在局限性。虽然当前用于医学图像分割的Transformer模型可以解决这个问题，但这些模型并不是专门设计用于3D OCT体积的应用，因为不同供应商的设备和提取技术具有不同的通道轴尺寸。为了解决这些问题，我们提出了一种新的Transformer架构SwinVFTR，专门用于精确分割3D体积OCT图像中的液体。我们首先利用基于通道的体积采样来训练在不同深度（B扫描）的OCT体积上进行处理。接下来，模型在编码器中使用了一种新颖的偏移窗口Transformer块，实现了更好的本地化和液体区域分割。此外，我们提出了一种新的体积注意力块，用于空间和深度关注，这种块改进了传统的残差跳过连接。因此，利用多类Dice损失，所提出的架构在三个公开的供应商特定OCT数据集（分别为Spectralis、Cirrus和Topcon）上表现优于其他现有的架构，平均Dice得分分别为0.72、0.59和0.68。此外，SwinVFTR在平均交叉比和结构相似性度量两个额外的相关度量方面优于其他架构。