Multimodal information is frequently available in medical tasks. By combining information from multiple sources, clinicians are able to make more accurate judgments. In recent years, multiple imaging techniques have been used in clinical practice for retinal analysis: 2D fundus photographs, 3D optical coherence tomography (OCT) and 3D OCT angiography, etc. Our paper investigates three multimodal information fusion strategies based on deep learning to solve retinal analysis tasks: early fusion, intermediate fusion, and hierarchical fusion. The commonly used early and intermediate fusions are simple but do not fully exploit the complementary information between modalities. We developed a hierarchical fusion approach that focuses on combining features across multiple dimensions of the network, as well as exploring the correlation between modalities. These approaches were applied to glaucoma and diabetic retinopathy classification, using the public GAMMA dataset (fundus photographs and OCT) and a private dataset of PlexElite 9000 (Carl Zeis Meditec Inc.) OCT angiography acquisitions, respectively. Our hierarchical fusion method performed the best in both cases and paved the way for better clinical diagnosis.
翻译:医疗任务中经常有多模式信息,通过综合多种来源的信息,临床医生能够作出更准确的判断。近年来,临床临床分析实践中采用了多种成像技术:2D Fundus照片、3D光学一致性断层摄影(OCT)和3D OCT血管造影等。我们的文件调查了三种基于深度学习解决视网膜分析任务的多模式信息融合战略:早期聚合、中间聚变和等级融合。常用的早期和中间聚变是简单的,但没有充分利用各种模式之间的互补信息。我们开发了一种分级融合方法,侧重于将网络多个层面的特征结合起来,并探索各种模式之间的相互关系。这些方法应用于青光眼和糖尿病再生病理分类,使用公共GAMMA数据集(Fundus照片和OCT)和PlexELLite 9000(Carl Zeis Meditec Inc.)的私人数据集。我们的分级融合方法在两种情况下都采用最佳的方法,为更好的临床诊断铺垫路。