This work demonstrates the ability to produce readily interpretable statistical metrics for model fit, fixed effects covariance coefficients, and prediction confidence. Importantly, this work compares 4 suitable and commonly applied epistemic UQ approaches, BNN, SWAG, MC dropout, and ensemble approaches in their ability to calculate these statistical metrics for the ARMED MEDL models. In our experiment for AD prognosis, not only do the UQ methods provide these benefits, but several UQ methods maintain the high performance of the original ARMED method, some even provide a modest (but not statistically significant) performance improvement. The ensemble models, especially the ensemble method with a 90% subsampling, performed well across all metrics we tested with (1) high performance that was comparable to the non-UQ ARMED model, (2) properly deweights the confounds probes and assigns them statistically insignificant p-values, (3) attains relatively high calibration of the output prediction confidence. Based on the results, the ensemble approaches, especially with a subsampling of 90%, provided the best all-round performance for prediction and uncertainty estimation, and achieved our goals to provide statistical significance for model fit, statistical significance covariate coefficients, and confidence in prediction, while maintaining the baseline performance of MEDL using ARMED
翻译:这项工作展示了为模型适合、固定效应共变系数和预测信心制定易于解释的统计指标的能力。重要的是,这项工作比较了4种合适和常用的通用缩写UQ方法、BNN、SWAG、MC辍学和合用方法,它们计算ARMED MEDL模型的这些统计指标的能力比了4种合适和常用的缩写UQ方法、BNN、SWAG、MC辍学和混合方法。 在我们的AD预测实验中,不仅UQ方法提供了这些好处,而且一些UQ方法保持了原ARMED方法的高性能,有些甚至提供了适度(但并非具有统计意义)的绩效改进。 混合模型,特别是含有90%子次抽样的混合方法,在所有测试中都得到了良好的表现:(1) 与非UMED MED 模型相当的高性能,(2) 适当地淡化了这些纠结点在统计上微不足道的p价值,(3) 实现了产出预测信心的相对高的校准,有些方法甚至提供了一种高的绩效方法,特别是以90 %的子抽样方法,同时提供了统计预测的准确度,并且提供了我们所测测测测测测测测测的统计目标的最佳程度。