基于证据理论的糖尿病视网膜病变分级CNN与ViT混合框架 (A Hybrid Framework Bridging CNN and ViT based on Theory of Evidence for Diabetic Retinopathy Grading)

Diabetic retinopathy (DR) is a leading cause of vision loss among middle-aged and elderly people, which significantly impacts their daily lives and mental health. To improve the efficiency of clinical screening and enable the early detection of DR, a variety of automated DR diagnosis systems have been recently established based on convolutional neural network (CNN) or vision Transformer (ViT). However, due to the own shortages of CNN / ViT, the performance of existing methods using single-type backbone has reached a bottleneck. One potential way for the further improvements is integrating different kinds of backbones, which can fully leverage the respective strengths of them (\emph{i.e.,} the local feature extraction capability of CNN and the global feature capturing ability of ViT). To this end, we propose a novel paradigm to effectively fuse the features extracted by different backbones based on the theory of evidence. Specifically, the proposed evidential fusion paradigm transforms the features from different backbones into supporting evidences via a set of deep evidential networks. With the supporting evidences, the aggregated opinion can be accordingly formed, which can be used to adaptively tune the fusion pattern between different backbones and accordingly boost the performance of our hybrid model. We evaluated our method on two publicly available DR grading datasets. The experimental results demonstrate that our hybrid model not only improves the accuracy of DR grading, compared to the state-of-the-art frameworks, but also provides the excellent interpretability for feature fusion and decision-making.

翻译：糖尿病视网膜病变（DR）是中老年人视力丧失的主要原因，严重影响其日常生活和心理健康。为提高临床筛查效率并实现DR早期检测，近年来基于卷积神经网络（CNN）或视觉Transformer（ViT）的自动化DR诊断系统相继建立。然而，由于CNN/ViT自身局限性，采用单一类型骨干网络的现有方法性能已遇瓶颈。进一步改进的潜在途径是整合不同种类骨干网络，以充分发挥其各自优势（即CNN的局部特征提取能力与ViT的全局特征捕获能力）。为此，我们提出一种基于证据理论的新范式，以有效融合不同骨干网络提取的特征。具体而言，所提出的证据融合范式通过一组深度证据网络将不同骨干网络的特征转化为支持证据。基于这些支持证据，可相应形成聚合意见，用于自适应调整不同骨干网络间的融合模式，从而提升混合模型性能。我们在两个公开DR分级数据集上评估了本方法。实验结果表明，相较于最先进框架，我们的混合模型不仅提高了DR分级的准确性，还为特征融合与决策过程提供了卓越的可解释性。