Convolutional neural networks have enabled significant improvements in medical image-based diagnosis. It is, however, increasingly clear that these models are susceptible to performance degradation when facing spurious correlations and dataset shift, leading, e.g., to underperformance on underrepresented patient groups. In this paper, we compare two classification schemes on the ADNI MRI dataset: a simple logistic regression model using manually selected volumetric features, and a convolutional neural network trained on 3D MRI data. We assess the robustness of the trained models in the face of varying dataset splits, training set sex composition, and stage of disease. In contrast to earlier work in other imaging modalities, we do not observe a clear pattern of improved model performance for the majority group in the training dataset. Instead, while logistic regression is fully robust to dataset composition, we find that CNN performance is generally improved for both male and female subjects when including more female subjects in the training dataset. We hypothesize that this might be due to inherent differences in the pathology of the two sexes. Moreover, in our analysis, the logistic regression model outperforms the 3D CNN, emphasizing the utility of manual feature specification based on prior knowledge, and the need for more robust automatic feature selection.
翻译:然而,越来越明显的是,这些模型在面对虚假的关联和数据集变化时容易出现性能退化,从而导致(例如)代表性不足的病人群体表现不佳。在本文件中,我们比较了ADNI MRI数据集的两个分类办法:一个简单的后勤回归模型,使用人工选择的体积特征,以及一个接受3D MRI数据培训的神经神经网络。我们评估了在面对不同的数据集分裂、培训设定的性别构成和疾病阶段时经过培训的模型的稳健性。与其他成像模式的早期工作相比,我们没有观察到培训数据集中大多数群体在改进模型性能方面的明显模式模式。相反,虽然后勤回归对数据集组成是完全可靠的,但我们发现CNN在将更多的女性科目纳入培训数据集时,其性能普遍得到改善。我们推测,这可能是由于两种性别在病理学方面的内在差异。此外,在我们的分析中,逻辑回归模型比其他成的3D CNN系统更牢固的先前选择需要更精确的,强调手册的实用性特征。