Kernel survival analysis models estimate individual survival distributions with the help of a kernel function, which measures the similarity between any two data points. Such a kernel function can be learned using deep kernel survival models. In this paper, we present a new deep kernel survival model called a survival kernet, which scales to large datasets in a manner that is amenable to model interpretation and also theoretical analysis. Specifically, the training data are partitioned into clusters based on a recently developed training set compression scheme for classification and regression called kernel netting that we extend to the survival analysis setting. At test-time, each data point is represented as a weighted combination of these clusters, and each such cluster can be visualized. For a special case of survival kernets, we establish a finite-sample error bound on predicted survival distributions that is, up to a log factor, optimal. Whereas scalability at test time is achieved using the aforementioned kernel netting compression strategy, scalability during training is achieved by a warm-start procedure based on tree ensembles such as XGBoost and a heuristic approach to accelerating neural architecture search. On three standard survival analysis datasets of varying sizes (up to roughly 3 million data points), we show that survival kernets are highly competitive with the best of baselines tested in terms of concordance index. Our code is available at: https://github.com/georgehc/survival-kernets
翻译:内核生存分析模型使用一个内核函数来估计个体生存分布,这种内核功能用来测量两个数据点之间的相似性。这样的内核函数可以用深内核生存模型来学习。在本文中,我们展示了一个新的深内核生存模型,称为生存内核,这个模型以适合模型解释和理论分析的方式,将数据缩放到大型数据集中。具体地说,培训数据根据最近开发的分类和回归压缩培训套件压缩计划分为组群,称为内核网,称为内核网,用来测量任何两个数据点之间的相似性。在测试时,每个数据点代表着这些组群的加权组合,每个组可以视觉化。对于生存内核网络的特殊案例,我们根据预测的存活分布,以一个逻辑因素和理论分析最优化的方式,建立一定范围的误差。试验时间的缩放是通过上述内核网网网网网网网网网网压缩战略实现的缩放过程,培训期间的缩放是通过一种温暖启动程序实现的,我们扩展到生存分析环境分析设置的树团。 在XGB储/内核数据库中,以高内核的内核数据基分析中,以高内核数据序列分析方式显示我们的数据基底基底基系的精确的基系为标准。