A key challenge in building effective regression models for large and diverse populations is accounting for patient heterogeneity. An example of such heterogeneity is in health system risk modeling efforts where different combinations of comorbidities fundamentally alter the relationship between covariates and health outcomes. Accounting for heterogeneity arising combinations of factors can yield more accurate and interpretable regression models. Yet, in the presence of high dimensional covariates, accounting for this type of heterogeneity can exacerbate estimation difficulties even with large sample sizes. To handle these issues, we propose a flexible and interpretable risk modeling approach based on semiparametric sufficient dimension reduction. The approach accounts for patient heterogeneity, borrows strength in estimation across related subpopulations to improve both estimation efficiency and interpretability, and can serve as a useful exploratory tool or as a powerful predictive model. In simulated examples, we show that our approach often improves estimation performance in the presence of heterogeneity and is quite robust to deviations from its key underlying assumptions. We demonstrate our approach in an analysis of hospital admission risk for a large health system and demonstrate its predictive power when tested on further follow-up data.
翻译:在为大量和多样化的人口建立有效的回归模型方面,一个关键的挑战是如何为大量不同的人口制定有效的回归模型。这种异质性的一个例子是在卫生系统风险模拟工作中,由于不同组合的共变性从根本上改变了共变和健康结果之间的关系。各种因素组合的异质性会计可以产生更准确和可解释的回归模型。然而,在高维共变中,核算这种异质性可能加剧估计困难,即使抽样规模很大。为了处理这些问题,我们建议采用灵活和可解释的风险建模方法,以半等分数充分减少维度为基础。关于患者异性核算的方法,借用不同相关亚群群群的估计强度,以提高估计效率和可解释性,并可作为有用的探索工具或强有力的预测模型。在模拟实例中,我们表明我们的方法往往会改进对存在异质性的表现的估计,而且非常可靠地偏离其关键的基本假设。我们在对大规模卫生系统住院风险的分析中展示了我们的方法,并在对大规模卫生系统进行预测数据测试时展示了我们的方法。