自我监督的强力和高效医疗成像 (Robust and Efficient Medical Imaging with Self-Supervision)

Shekoofeh Azizi,Laura Culp,Jan Freyberg,Basil Mustafa,Sebastien Baur,Simon Kornblith,Ting Chen,Patricia MacWilliams,S. Sara Mahdavi,Ellery Wulczyn,Boris Babenko,Megan Wilson,Aaron Loh,Po-Hsuan Cameron Chen,Yuan Liu,Pinal Bavishi,Scott Mayer McKinney,Jim Winkens,Abhijit Guha Roy,Zach Beaver,Fiona Ryan,Justin Krogue,Mozziyar Etemadi,Umesh Telang,Yun Liu,Lily Peng,Greg S. Corrado,Dale R. Webster,David Fleet,Geoffrey Hinton,Neil Houlsby,Alan Karthikesalingam,Mohammad Norouzi,Vivek Natarajan

Recent progress in Medical Artificial Intelligence (AI) has delivered systems that can reach clinical expert level performance. However, such systems tend to demonstrate sub-optimal "out-of-distribution" performance when evaluated in clinical settings different from the training environment. A common mitigation strategy is to develop separate systems for each clinical setting using site-specific data [1]. However, this quickly becomes impractical as medical data is time-consuming to acquire and expensive to annotate [2]. Thus, the problem of "data-efficient generalization" presents an ongoing difficulty for Medical AI development. Although progress in representation learning shows promise, their benefits have not been rigorously studied, specifically for out-of-distribution settings. To meet these challenges, we present REMEDIS, a unified representation learning strategy to improve robustness and data-efficiency of medical imaging AI. REMEDIS uses a generic combination of large-scale supervised transfer learning with self-supervised learning and requires little task-specific customization. We study a diverse range of medical imaging tasks and simulate three realistic application scenarios using retrospective data. REMEDIS exhibits significantly improved in-distribution performance with up to 11.5% relative improvement in diagnostic accuracy over a strong supervised baseline. More importantly, our strategy leads to strong data-efficient generalization of medical imaging AI, matching strong supervised baselines using between 1% to 33% of retraining data across tasks. These results suggest that REMEDIS can significantly accelerate the life-cycle of medical imaging AI development thereby presenting an important step forward for medical imaging AI to deliver broad impact.

翻译：医学人工智能(AI)的近期进展提供了能够达到临床专家水平业绩的系统,然而,在与培训环境不同的临床环境中,这种系统在评价临床环境时往往表现出低于最佳的“分配外”性业绩,但这种系统在评价与培训环境不同的临床环境时往往显示“分配外”性业绩。一个共同的缓解战略是利用具体地点的数据为每个临床环境开发单独的系统[1];然而,由于医疗数据需要花费时间获取,而且花费大量时间来说明[2],这很快变得不切实际。因此,“数据高效的概括化”问题给医学AI的发展带来了持续困难。虽然在代表性学习方面的进展有希望,但其效益并没有得到严格的研究,特别是在分配外的环境环境方面。为了迎接这些挑战,我们提出了REMEDIS,一个统一的代表性学习战略是提高医疗成像的稳健性和数据效率。 REMEDIS使用自我监督学习的大规模受监督的转移性学习和很少需要定制。我们研究的多种医疗成像任务,并用追溯数据模拟三种现实的应用假设。REMEIS显示,在分配业绩方面有显著改进,其达到11.5%的相对较高的比例,在诊断性准确性基本基线上显示我们有严格监督的医学成像学的33的数据比基准之间的数据。