The lack of non-parametric statistical tests for confounding bias significantly hampers the development of robust, valid and generalizable predictive models in many fields of research. Here I propose the partial and full confounder tests, which, for a given confounder variable, probe the null hypotheses of unconfounded and fully confounded models, respectively. The tests provide a strict control for Type I errors and high statistical power, even for non-normally and non-linearly dependent predictions, often seen in machine learning. Applying the proposed tests on models trained on functional brain connectivity data from the Human Connectome Project and the Autism Brain Imaging Data Exchange dataset reveals confounders that were previously unreported or found to be hard to correct for with state-of-the-art confound mitigation approaches. The tests, implemented in the package mlconfound (https://mlconfound.readthedocs.io), can aid the assessment and improvement of the generalizability and neurobiological validity of predictive models and, thereby, foster the development of clinically useful machine learning biomarkers.
翻译:缺乏非参数统计测试以弥补偏见,这严重阻碍了在许多研究领域开发稳健、有效和可通用的预测模型。在这里,我提议进行部分和完全的混淆模型测试,分别针对某一混凝土变量,分别探究无根据和完全混乱模型的空虚假设。这些测试严格控制了I型错误和高统计能力,即使是在机器学习中经常看到的非正常和非线性依赖预测。在人类连接项目和自闭症脑成像数据交换数据集中就功能性脑连通数据培训模型进行的拟议测试中应用了拟议测试,显示了以前没有报告过的或发现难以纠正的混乱因素。在Mlconfound软件包(https://mlconfound.readthedocs.io)中实施的测试可以帮助评估和改进预测模型的一般性和神经生物学有效性,从而推动开发临床有用的机器生物识别器。