Despite the great promise that machine learning has offered in many fields of medicine, it has also raised concerns about potential biases and poor generalization across genders, age distributions, races and ethnicities, hospitals, and data acquisition equipment and protocols. In the current study, and in the context of three brain diseases, we provide experimental data which support that when properly trained, machine learning models can generalize well across diverse conditions and do not suffer from biases. Specifically, by using multi-study magnetic resonance imaging consortia for diagnosing Alzheimer's disease, schizophrenia, and autism spectrum disorder, we find that, the accuracy of well-trained models is consistent across different subgroups pertaining to attributes such as gender, age, and racial groups, as also different clinical studies. We find that models that incorporate multi-source data from demographic, clinical, genetic factors and cognitive scores are also unbiased. These models have better predictive accuracy across subgroups than those trained only with structural measures in some cases but there are also situations when these additional features do not help.
翻译:尽管机器学习在许多医学领域大有希望,但它也使人们对性别、年龄分布、种族和族裔、医院、以及数据获取设备和协议方面的潜在偏见和不甚普遍表示关切。在目前的研究中,以及在三种脑疾病的情况下,我们提供了实验数据,支持经过适当培训的机器学习模型可以广泛分布于各种条件,而不会受到偏见的影响。具体地说,通过使用多研究磁共振成像联合体诊断阿尔茨海默氏病、精神分裂症和自闭症谱谱系障碍,我们发现,经过良好训练的模型在性别、年龄和种族等属性的不同分组之间具有一致性,同时也是不同的临床研究。我们发现,纳入人口、临床、遗传因素和认知得分的多源数据的模式也是不带偏见的。这些模型在各分组之间比在有些情况下仅经过结构性计量的训练的分组更具有预测性准确性,但是,在这些额外特征没有帮助的情况下,也存在这样的情况。