Healthcare cost prediction is a challenging task due to the high-dimensionality and high correlation among covariates. Additionally, the skewed, heavy-tailed, and often multi-modal nature of cost data can complicate matters further due to unobserved heterogeneity. In this study, we propose a novel framework for finite mixture regression models that incorporates covariate clustering methods to better account for the effects of clustered covariates on subgroups of the outcome, which enables a more accurate characterization of the complex distribution of the data. The proposed framework can be formulated as a convex optimization problem with an additional penalty term based on the prior similarity of the covariates. To efficiently solve this optimization problem, a specialized EM-ADMM algorithm is proposed that integrates the alternating direction multiplicative method (ADMM) into the iterative process of the expectation-maximizing (EM) algorithm. The convergence of the algorithm and the efficiency of the covariate clustering method are verified using simulation data, and the superiority of the approach over traditional regression techniques is demonstrated using two real Chinese medical expenditure datasets. Our empirical results provide valuable insights into the complex network graph of the covariates and can inform business practices, such as the design and pricing of medical insurance products.
翻译:健康成本预测是一项艰巨的任务,原因是共变体的高度维度和高度相关性。此外,成本数据的偏斜、重尾和往往多式性质,由于未观测到的异质性,可能会使问题更加复杂。在本研究中,我们提议了一个限定混合物回归模型的新框架,其中纳入混合组合组合方法,以更好地说明组合组合组合组合对结果分组的影响,从而能够更准确地描述数据复杂分布的复杂特征。拟议框架可以作为一个螺旋式优化问题,根据以前相似的共变体增加一个惩罚术语。为了有效解决这一优化问题,建议采用专门的EM-ADMM 算法,将交替方向的多复制法(ADMMM)纳入预期-混合(EM)算法的迭接过程。使用模拟数据来验证算法与共变组合方法的结合和效率。传统回归技术优于传统的回归技术,用两个中国实际的医疗支出数据集来补充。我们的经验性结果可以作为复杂的医学数据,为复杂的网络设计产品提供宝贵的数据。我们的经验性判读结果可以作为复杂的网络设计图中的医学数据。</s>