Deep learning models can be applied successfully in real-work problems; however, training most of these models requires massive data. Recent methods use language and vision, but unfortunately, they rely on datasets that are not usually publicly available. Here we pave the way for further research in the multimodal language-vision domain for radiology. In this paper, we train a representation learning method that uses local and global representations of the language and vision through an attention mechanism and based on the publicly available Indiana University Radiology Report (IU-RR) dataset. Furthermore, we use the learned representations to diagnose five lung pathologies: atelectasis, cardiomegaly, edema, pleural effusion, and consolidation. Finally, we use both supervised and zero-shot classifications to extensively analyze the performance of the representation learning on the IU-RR dataset. Average Area Under the Curve (AUC) is used to evaluate the accuracy of the classifiers for classifying the five lung pathologies. The average AUC for classifying the five lung pathologies on the IU-RR test set ranged from 0.85 to 0.87 using the different training datasets, namely CheXpert and CheXphoto. These results compare favorably to other studies using UI-RR. Extensive experiments confirm consistent results for classifying lung pathologies using the multimodal global local representations of language and vision information.
翻译:深层次的学习模式可以成功地应用于实际工作问题;然而,培训大部分这类模式需要大量的数据。最近的方法使用语言和愿景,但不幸的是,它们依赖通常不公开的数据集。我们在这里为辐射学多式语言视野领域的进一步研究铺平了道路。在本文中,我们通过关注机制,并根据公开提供的印第安纳大学放射学报告数据集,培训一种代表学习方法,使用当地和全球语言和愿景的表述方式;此外,我们使用学习的演示方式诊断五种肺病理:阿利帕西、心血管、水肿、胸膜浸透和整合。最后,我们使用监督分类和零点分类方法,广泛分析在辐射多式语言数据库中的代表学习的绩效。在Curve(AUC)下的平均区域,用来评估分类五种肺病理的精度。我们用平均ACUC用于对IU-RR测试组的五种肺病理学病理进行分类,从0.85到0.87,使用不同培训方向的分类方法,即使用不断的Che-RUA类分析结果,使用不断的地理-RU化分析结果。