Interpreting critical variables involved in complex biological processes related to survival time can help understand prediction from survival models, evaluate treatment efficacy, and develop new therapies for patients. Currently, the predictive results of deep learning (DL)-based models are better than or as good as standard survival methods, they are often disregarded because of their lack of transparency and little interpretability, which is crucial to their adoption in clinical applications. In this paper, we introduce a novel, easily deployable approach, called EXplainable CEnsored Learning (EXCEL), to iteratively exploit critical variables and simultaneously implement (DL) model training based on these variables. First, on a toy dataset, we illustrate the principle of EXCEL; then, we mathematically analyze our proposed method, and we derive and prove tight generalization error bounds; next, on two semi-synthetic datasets, we show that EXCEL has good anti-noise ability and stability; finally, we apply EXCEL to a variety of real-world survival datasets including clinical data and genetic data, demonstrating that EXCEL can effectively identify critical features and achieve performance on par with or better than the original models. It is worth pointing out that EXCEL is flexibly deployed in existing or emerging models for explainable survival data in the presence of right censoring.
翻译:解释与生存时间有关的复杂生物过程所涉及的关键变量,有助于理解生存模型的预测,评估治疗效果,并为病人开发新的治疗方法。目前,深学习模型的预测结果优于或好于标准生存方法,这些模型往往由于缺乏透明度和解释性而被忽视,这对于临床应用采用这些模型至关重要。在本文中,我们引入了一种创新的、容易部署的方法,称为可推广的Censordered Learning(EXCEL),以迭接地利用关键变量,同时根据这些变量实施(DL)模型培训。首先,在玩具数据集上,我们展示EXCEL的原则;然后,我们从数学角度分析我们提出的方法,我们得出并证明严格的通用错误界限;接下来,在两个半合成数据集上,我们显示EXCEL拥有良好的抗噪能力和稳定性;最后,我们将EXCEL应用于一系列真实世界生存数据集,包括临床数据和遗传数据,表明EXCEL能够有效地识别关键特征,并且实现EXCEL的原则;然后,我们从总体上分析我们所提议的方法,并证明我们所部署的运行的模型比正在更精确地展示的模型,或更准确地解释在正在形成的模型中进行下去。