Risk scores are widely used for clinical decision making and commonly generated from logistic regression models. Machine-learning-based methods may work well for identifying important predictors, but such 'black box' variable selection limits interpretability, and variable importance evaluated from a single model can be biased. We propose a robust and interpretable variable selection approach using the recently developed Shapley variable importance cloud (ShapleyVIC) that accounts for variability across models. Our approach evaluates and visualizes overall variable contributions for in-depth inference and transparent variable selection, and filters out non-significant contributors to simplify model building steps. We derive an ensemble variable ranking from variable contributions, which is easily integrated with an automated and modularized risk score generator, AutoScore, for convenient implementation. In a study of early death or unplanned readmission, ShapleyVIC selected 6 of 41 candidate variables to create a well-performing model, which had similar performance to a 16-variable model from machine-learning-based ranking.
翻译:风险分数被广泛用于临床决策,并且通常由物流回归模型产生。基于机器学习的方法在确定重要预测器方面可能效果良好,但这种“黑盒”的变量选择限制可解释性,从单一模型中评估的可变重要性可能有偏差。我们建议采用一种稳健和可解释的可变选择方法,使用最近开发的可变可变重要云云(Shapley可变云(ShaplyVIC))来计算不同模型的可变性。我们的方法评估和可视化总体可变性贡献,以深入推断和透明的变量选择,并过滤非重大贡献者以简化建模步骤。我们从可变贡献中得出一个共同变量的变量排名,很容易与自动和模块化风险分数生成器AutoScore整合,以便于实施。在对早期死亡或意外重新授精度的研究中,ShapleyVIC从41个候选变量中选取了6个,以创造出一种良好的模型,其性与机器学习排名的16个可变模型相似。