Survival analysis is a challenging variation of regression modeling because of the presence of censoring, where the outcome measurement is only partially known, due to, for example, loss to follow up. Such problems come up frequently in medical applications, making survival analysis a key endeavor in biostatistics and machine learning for healthcare, with Cox regression models being amongst the most commonly employed models. We describe a new approach for survival analysis regression models, based on learning mixtures of Cox regressions to model individual survival distributions. We propose an approximation to the Expectation Maximization algorithm for this model that does hard assignments to mixture groups to make optimization efficient. In each group assignment, we fit the hazard ratios within each group using deep neural networks, and the baseline hazard for each mixture component non-parametrically. We perform experiments on multiple real world datasets, and look at the mortality rates of patients across ethnicity and gender. We emphasize the importance of calibration in healthcare settings and demonstrate that our approach outperforms classical and modern survival analysis baselines, both in terms of discriminative performance and calibration, with large gains in performance on the minority demographics.
翻译:生存分析是回归模型的一个具有挑战性的变异,因为存在审查,结果衡量仅部分为人所知,其原因包括:缺少跟踪。这些问题在医疗应用中经常出现,使生存分析成为生物统计学和保健机器学习的关键努力,而Cox回归模型是最常用的模式之一。我们描述了一种基于Cox回归回归的学习混合物的新的生存分析回归模型,以个人生存分布模型为基础。我们建议接近于这一模型的预期最大化算法,该模型对混合组群进行硬性分配,以便优化效率。在每一次组别任务中,我们使用深层神经网络将危险比率与每个组群中的危害比率相匹配,并对每种混合物的基线危险进行非分辨性化。我们在多个真实世界数据集上进行实验,并审视不同族裔和性别患者的死亡率。我们强调在医疗保健环境中校准的重要性,并表明我们的方法在区别性表现和校准方面都超越了传统和现代生存分析基线,在少数群体人口特征上取得了很大成绩。