The fovea is an important anatomical landmark of the retina. Detecting the location of the fovea is essential for the analysis of many retinal diseases. However, robust fovea localization remains a challenging problem, as the fovea region often appears fuzzy, and retina diseases may further obscure its appearance. This paper proposes a novel vision transformer (ViT) approach that integrates information both inside and outside the fovea region to achieve robust fovea localization. Our proposed network named Bilateral-Vision-Transformer (Bilateral-ViT) consists of two network branches: a transformer-based main network branch for integrating global context across the entire fundus image and a vessel branch for explicitly incorporating the structure of blood vessels. The encoded features from both network branches are subsequently merged with a customized multi-scale feature fusion (MFF) module. Our comprehensive experiments demonstrate that the proposed approach is significantly more robust for diseased images and establishes the new state of the arts on both Messidor and PALM datasets.
翻译:视网膜是视网膜的一个重要解剖标志。 检测视网膜的位置是分析许多视网膜疾病的关键所在。 但是, 稳健的视网膜定位仍然是一个具有挑战性的问题, 因为视网膜区域经常显得模糊, 视网膜疾病可能进一步模糊其外观 。 本文提出一种新的视觉变压器( VIT) 方法, 整合在视网膜区域内外的信息, 以实现稳健的视网膜定位 。 我们提议的网络名为双边视网膜- 异端( Biture- Vit), 由两个网络分支组成: 一个基于变压器的主要网络分支, 在整个视网膜图像中整合全球背景, 以及一个船舶分支, 明确整合血管结构 。 两个网络分支的编码特性随后会与一个定制的多尺度特征融合模块( MFF) 合并。 我们的全面实验表明, 拟议的方法对于疾病图像来说更加强大, 并建立了梅西多和 PALM 数据集的新艺术状态 。