Domain shift, the mismatch between training and testing data characteristics, causes significant degradation in the predictive performance in multi-source imaging scenarios. In medical imaging, the heterogeneity of population, scanners and acquisition protocols at different sites presents a significant domain shift challenge and has limited the widespread clinical adoption of machine learning models. Harmonization methods which aim to learn a representation of data invariant to these differences are the prevalent tools to address domain shift, but they typically result in degradation of predictive accuracy. This paper takes a different perspective of the problem: we embrace this disharmony in data and design a simple but effective framework for tackling domain shift. The key idea, based on our theoretical arguments, is to build a pretrained classifier on the source data and adapt this model to new data. The classifier can be fine-tuned for intra-site domain adaptation. We can also tackle situations where we do not have access to ground-truth labels on target data; we show how one can use auxiliary tasks for adaptation; these tasks employ covariates such as age, gender and race which are easy to obtain but nevertheless correlated to the main task. We demonstrate substantial improvements in both intra-site domain adaptation and inter-site domain generalization on large-scale real-world 3D brain MRI datasets for classifying Alzheimer's disease and schizophrenia.
翻译:在医学成像中,不同地点的人口、扫描仪和采购协议的多样性是一个巨大的领域转移挑战,限制了广泛临床采用机器学习模式。 旨在了解与这些差异无关的数据的表达方式的统一方法是处理域转移的普遍工具,但通常导致预测准确性下降。本文从不同的角度看待问题:我们接受数据中的这种不协调性,设计一个处理域转移的简单而有效的框架。根据我们的理论论点,关键的想法是建立一个源数据预先训练的分类器,使这一模型适应新的数据。该分类器可以对现场内部领域适应进行微调。我们还可以处理我们无法在目标数据上使用地面图解标签的情况;我们展示如何利用辅助性任务来适应;这些任务采用诸如年龄、性别和种族等易于获得但与主要任务相关联的共变式方法。我们展示了在内部领域内地磁系统内部数据调整和大规模脑系统化方面的巨大改进。