We propose a novel deep neural network architecture to learn interpretable representation for medical image analysis. Our architecture generates a global attention for region of interest, and then learns bag of words style deep feature embeddings with local attention. The global, and local feature maps are combined using a contemporary transformer architecture for highly accurate Gallbladder Cancer (GBC) detection from Ultrasound (USG) images. Our experiments indicate that the detection accuracy of our model beats even human radiologists, and advocates its use as the second reader for GBC diagnosis. Bag of words embeddings allow our model to be probed for generating interpretable explanations for GBC detection consistent with the ones reported in medical literature. We show that the proposed model not only helps understand decisions of neural network models but also aids in discovery of new visual features relevant to the diagnosis of GBC. Source-code and model will be available at https://github.com/sbasu276/RadFormer
翻译:我们建议建立一个新型的深神经网络架构,以学习医学图像分析的可解释的描述。 我们的架构对感兴趣的区域产生全球关注,然后学习一包文字风格风格的深度特征嵌入当地关注。 全球和本地地貌地图结合使用现代变压器结构,用超声波图像进行高精准的宫球癌检测。 我们的实验表明,我们模型的检测准确性甚至比人类放射学家都要强,并提倡将其用作GBC诊断的第二读者。 嵌入的词包使我们的模型可以被探测,为GBC检测提供与医学文献所报告的一致的可解释的解释性解释性解释。 我们显示,拟议的模型不仅有助于理解神经网络模型的决定,而且有助于发现与GBC诊断相关的新视觉特征。 源码和模型将在https://github.com/sbasu276/RadFormer上查阅。