Despite of the success of multi-modal foundation models pre-trained on large-scale data in natural language understanding and vision recognition, its counterpart in medical and clinical domains remains preliminary, due to the fine-grained recognition nature of the medical tasks with high demands on domain knowledge. Here, we propose a knowledge-enhanced vision-language pre-training approach for auto-diagnosis on chest X-ray images. The algorithm, named Knowledge-enhanced Auto Diagnosis~(KAD), first trains a knowledge encoder based on an existing medical knowledge graph, i.e., learning neural embeddings of the definitions and relationships between medical concepts and then leverages the pre-trained knowledge encoder to guide the visual representation learning with paired chest X-rays and radiology reports. We experimentally validate KAD's effectiveness on three external X-ray datasets. The zero-shot performance of KAD is not only comparable to that of the fully-supervised models but also, for the first time, superior to the average of three expert radiologists for three (out of five) pathologies with statistical significance. When the few-shot annotation is available, KAD also surpasses all existing approaches in finetuning settings, demonstrating the potential for application in different clinical scenarios.
翻译:尽管在自然语言理解和认识视觉方面对大规模数据进行预先培训的多模式基础模型取得了成功,但医学和临床领域的对应模型仍然处于初步阶段,原因是对领域知识的需求很高的医疗任务具有细微的识别性质。在这里,我们提议对胸前X射线图像进行自动诊断,采用知识强化的视觉语言预培训方法。名为“知识强化自动诊断”的算法(KAD)首先根据现有医学知识图表,即学习医学概念之间定义和关系的神经嵌入,然后利用预先培训的知识编码器,用配对胸部X射线和放射学报告指导视觉表述学习。我们实验性地验证KAD在三个外部X射线数据集上的功效。KAD的零发性表现不仅与完全超强的模式相比,而且首次在三种(五种)临床概念和关系之间的神经嵌入,比三种(五种)临床模式的3位专家平均调高,同时展示了现有的具有统计重要性的临床模型应用。当现有三种(五种)路径模型的微数位放射学家,同时也展示了所有模型的超度应用。</s>