从1亿医疗图像中学习 (Self-supervised Learning from 100 Million Medical Images)

Florin C. Ghesu,Bogdan Georgescu,Awais Mansoor,Youngjin Yoo,Dominik Neumann,Pragneshkumar Patel,R. S. Vishwanath,James M. Balter,Yue Cao,Sasa Grbic,Dorin Comaniciu

Building accurate and robust artificial intelligence systems for medical image assessment requires not only the research and design of advanced deep learning models but also the creation of large and curated sets of annotated training examples. Constructing such datasets, however, is often very costly -- due to the complex nature of annotation tasks and the high level of expertise required for the interpretation of medical images (e.g., expert radiologists). To counter this limitation, we propose a method for self-supervised learning of rich image features based on contrastive learning and online feature clustering. For this purpose we leverage large training datasets of over 100,000,000 medical images of various modalities, including radiography, computed tomography (CT), magnetic resonance (MR) imaging and ultrasonography. We propose to use these features to guide model training in supervised and hybrid self-supervised/supervised regime on various downstream tasks. We highlight a number of advantages of this strategy on challenging image assessment problems in radiography, CT and MR: 1) Significant increase in accuracy compared to the state-of-the-art (e.g., AUC boost of 3-7% for detection of abnormalities from chest radiography scans and hemorrhage detection on brain CT); 2) Acceleration of model convergence during training by up to 85% compared to using no pretraining (e.g., 83% when training a model for detection of brain metastases in MR scans); 3) Increase in robustness to various image augmentations, such as intensity variations, rotations or scaling reflective of data variation seen in the field.

翻译：为医学图像评估建立准确而稳健的人工智能系统不仅需要研究和设计先进的深层次学习模型和网上特征群集,而且需要创建大量和经过整理的附加说明的培训实例。然而,建立这类数据集往往费用很高,因为说明任务性质复杂,解释医学图像需要高水平的专门知识(例如,专家放射学家)。为了克服这一限制,我们建议采用一种方法,在对立学习和在线特征群集的基础上,自行监督地学习丰富的图像特征。为此,我们利用了10万多套大型培训数据集,其中包括各种模式的医学图像,包括放射学、计算成像学、磁共振动成像学、磁共振动成像和超声波学。我们提议利用这些功能指导在监督和混合的自我监督/监督/监督下游任务制度方面的示范培训。我们强调这一战略在挑战建模成像学、CT和MRM:1) 与最新技术相比,准确性差异显著增加(例如AUC在扫描过程中将3-7 %的医学成像学成像学升级,在扫描前将85的成型的成型的成型的成型的成型的变变变,在扫描中将AAAAA中将SAR2中将SBAR的成的成成成的成的成的成成成型中将AAAA的成BLMLMLMLM3)