We present deep significance clustering (DICE), a framework for jointly performing representation learning and clustering for "outcome-aware" stratification. DICE is intended to generate cluster membership that may be used to categorize a population by individual risk level for a targeted outcome. Following the representation learning and clustering steps, we embed the objective function in DICE with a constraint which requires a statistically significant association between the outcome and cluster membership of learned representations. DICE further includes a neural architecture search step to maximize both the likelihood of representation learning and outcome classification accuracy with cluster membership as the predictor. To demonstrate its utility in medicine for patient risk-stratification, the performance of DICE was evaluated using two datasets with different outcome ratios extracted from real-world electronic health records. Outcomes are defined as acute kidney injury (30.4\%) among a cohort of COVID-19 patients, and discharge disposition (36.8\%) among a cohort of heart failure patients, respectively. Extensive results demonstrate that DICE has superior performance as measured by the difference in outcome distribution across clusters, Silhouette score, Calinski-Harabasz index, and Davies-Bouldin index for clustering, and Area under the ROC Curve (AUC) for outcome classification compared to several baseline approaches.
翻译:我们提出了具有深远意义的集群(DICE),这是共同进行代表性学习和“结果认知”分级集中化的框架,目的是产生集群成员,可用于按个人风险水平对人口进行分类,以取得有针对性的结果。在代表性学习和集群步骤之后,我们将目标功能嵌入DICE, 其制约要求将结果和学习表现分组成员之间的统计上的重要联系。DICE还包含一个神经结构搜索步骤,以最大限度地提高代表性学习和结果分类准确性的可能性,作为集成成员作为预测者。为了表明其在患者风险分级医疗中的效用,DICE的业绩评估使用了两个数据集,该数据集采用从实际世界电子健康记录中提取的不同结果比率进行分类。结果被定义为:在一组COVID-19病人中严重肾损伤(30.4 ⁇ ),在一组心脏病患者中排泄处理(36.8 ⁇ ),分别是一组心脏病患者。广泛的结果显示,DICE根据各组、Silhouette分、Calinski-Harabaz和Davies-Boulve公司对若干基本结果分类、区域和RAC下的一些基准和成果分类(区域)的数值比较指标方法衡量。