The rapid development of diagnostic technologies in healthcare is leading to higher requirements for physicians to handle and integrate the heterogeneous, yet complementary data that are produced during routine practice. For instance, the personalized diagnosis and treatment planning for a single cancer patient relies on the various images (e.g., radiological, pathological, and camera images) and non-image data (e.g., clinical data and genomic data). However, such decision-making procedures can be subjective, qualitative, and have large inter-subject variabilities. With the recent advances in multi-modal deep learning technologies, an increasingly large number of efforts have been devoted to a key question: how do we extract and aggregate multi-modal information to ultimately provide more objective, quantitative computer-aided clinical decision making? This paper reviews the recent studies on dealing with such a question. Briefly, this review will include the (1) overview of current multi-modal learning workflows, (2) summarization of multi-modal fusion methods, (3) discussion of the performance, (4) applications in disease diagnosis and prognosis, and (5) challenges and future directions.
翻译:医疗领域诊断技术的迅速发展导致医生处理和整合各种不同数据的需求增加,但在日常工作中产生的补充性数据也随之增加,例如,单一癌症患者的个人化诊断和治疗规划依赖于各种图像(如放射、病理学和照相机图象)和非图像数据(如临床数据和基因组数据),然而,这种决策程序可以是主观的、定性的,而且具有很大的学科间差异性。随着多式深层学习技术的最近进展,越来越多的努力集中在一个关键问题上:我们如何提取和综合多模式信息,最终提供更客观的、由计算机辅助的临床决策?本文回顾了最近关于处理这样一个问题的研究。这一审查将简要地包括:(1) 概述目前的多模式学习工作流程,(2) 总结多模式融合方法,(3) 讨论疾病诊断和诊断中的绩效,(4) 应用,以及(5) 挑战和未来方向。