Medical Visual Question Answering~(VQA) is a combination of medical artificial intelligence and popular VQA challenges. Given a medical image and a clinically relevant question in natural language, the medical VQA system is expected to predict a plausible and convincing answer. Although the general-domain VQA has been extensively studied, the medical VQA still needs specific investigation and exploration due to its task features. In the first part of this survey, we collect and discuss the publicly available medical VQA datasets up to date about the data source, data quantity, and task feature. In the second part, we review the approaches used in medical VQA tasks. We summarize and discuss their techniques, innovation, and potential improvement. In the last part, we analyze some medical-specific challenges for the field and discuss future research directions. Our goal is to provide comprehensive information for researchers interested in medical artificial intelligence.
翻译:医学视觉问题回答~(VQA)是医学人工智能和流行VQA挑战的结合体。考虑到医学形象和自然语言的临床相关问题,医学VQA系统预计将预测出一个可信和令人信服的答案。虽然对一般VQA进行了广泛研究,但医学VQA因其任务特点仍需要具体调查和探索。在本次调查的第一部分,我们收集和讨论关于数据来源、数据数量和任务特点的公开医疗VQA数据集。在第二部分,我们审查医疗VQA任务中使用的方法。我们总结和讨论其技术、创新和潜在改进。在最后一部分,我们分析实地的一些具体医学挑战并讨论未来的研究方向。我们的目标是向对医学人工智能感兴趣的研究人员提供综合信息。