In this paper, we consider the problem of disease diagnosis. Unlike the conventional learning paradigm that treats labels independently, we propose a knowledge-enhanced framework, that enables training visual representation with the guidance of medical domain knowledge. In particular, we make the following contributions: First, to explicitly incorporate experts' knowledge, we propose to learn a neural representation for the medical knowledge graph via contrastive learning, implicitly establishing relations between different medical concepts. Second, while training the visual encoder, we keep the parameters of the knowledge encoder frozen and propose to learn a set of prompt vectors for efficient adaptation. Third, we adopt a Transformer-based disease-query module for cross-model fusion, which naturally enables explainable diagnosis results via cross attention. To validate the effectiveness of our proposed framework, we conduct thorough experiments on three x-ray imaging datasets across different anatomy structures, showing our model is able to exploit the implicit relations between diseases/findings, thus is beneficial to the commonly encountered problem in the medical domain, namely, long-tailed and zero-shot recognition, which conventional methods either struggle or completely fail to realize.
翻译:在本文中,我们考虑疾病诊断问题。与独立处理标签的传统学习模式不同,我们提议了一个知识强化框架,通过医学领域知识的指导来进行视觉表现培训。特别是,我们做出以下贡献:首先,明确纳入专家的知识,我们提议通过对比学习来学习医学知识图的神经代表,隐含地建立不同医学概念之间的关系。第二,在培训视觉编码器的同时,我们保留知识编码器的参数,并提议学习一套迅速的矢量,以便有效适应。第三,我们采用基于变异器的疾病query模块用于跨模聚合,这自然能够通过交叉关注来解释诊断结果。为了验证我们拟议框架的有效性,我们建议对三个X射线成像数据集进行彻底实验,跨不同的解剖结构,表明我们的模型能够利用疾病/调查概念之间的隐含关系,从而有利于医学领域常见的问题,即长期的和零镜头识别,这些常规方法要么挣扎,要么完全无法实现。