Limited availability of large image datasets is a major issue in the development of accurate and generalizable machine learning methods in medicine. The limitations in the amount of data are mainly due to the use of different acquisition protocols, different hardware, and data privacy. At the same time, training a classification model on a small dataset leads to a poor generalization quality of the model. To overcome this issue, a combination of various image datasets of different provenance is often used, e.g., multi-site studies. However, if an additional dataset does not include all classes of the task, the learning of the classification model can be biased to the device or place of acquisition. This is especially the case for Magnetic Resonance (MR) images, where different MR scanners introduce a bias that limits the performance of the model. In this paper, we present a novel method that learns to ignore the scanner-related features present in the images, while learning features relevant for the classification task. We focus on a real-world scenario, where only a small dataset provides images of all classes. We exploit this circumstance by introducing specific additional constraints on the latent space, which lead the focus on disease-related rather than scanner-specific features. Our method Learn to Ignore outperforms state-of-the-art domain adaptation methods on a multi-site MRI dataset on a classification task between Multiple Sclerosis patients and healthy subjects.
翻译:大型图像数据集有限是开发准确和通用医学机器学习方法的一个主要问题。数据数量有限主要是因为使用了不同的购置协议、不同的硬件和数据隐私。与此同时,在小型数据集上培训分类模型导致模型的概括性质量差。为了解决这一问题,往往使用不同来源的各种图像数据集的组合,例如多现场研究。但如果额外数据集不包括所有任务类别,那么,对分类模型的学习可能偏向于获取设备或获取地点。特别是磁共振图像,不同的磁共振扫描仪在其中引入了限制模型性能的偏差。在本文件中,我们介绍了一种新颖的方法,学会忽略图像中与扫描仪有关的特征,同时学习与分类任务相关的特征。我们侧重于现实-世界情景,其中只有一个小数据集提供所有类别的图像。我们利用这一环境,在潜在空间上引入了额外的限制,即磁共振成图像(MR)图像,不同的磁共振感仪扫描仪引入了限制,从而在多功能上,而不是在多功能系统中,在多功能系统中引入了我们系统模型。