Utilizing clinical texts in survival analysis is difficult because they are largely unstructured. Current automatic extraction models fail to capture textual information comprehensively since their labels are limited in scope. Furthermore, they typically require a large amount of data and high-quality expert annotations for training. In this work, we present a novel method of using BERT-based hidden layer representations of clinical texts as covariates for proportional hazards models to predict patient survival outcomes. We show that hidden layers yield notably more accurate predictions than predefined features, outperforming the previous baseline model by 5.7% on average across C-index and time-dependent AUC. We make our work publicly available at https://github.com/bionlplab/heart_failure_mortality.
翻译:在生存分析中利用临床文本是困难的,因为它们基本上没有结构化的。当前的自动提取模型未能全面收集文本信息,因为它们的标签范围有限。此外,它们通常需要大量的数据和高质量的专家说明来进行培训。在这项工作中,我们提出了一个新方法,利用基于BERT的临床文本隐性层表征作为比例危害模型的共变体,以预测患者生存结果。我们显示,隐藏层的预测比预先界定的特征明显更准确,在C-index和依赖时间的ACU中,比以前的基线模型平均高出5.7%。我们在https://github.com/bionlplab/heart_failure_mortity上公布了我们的工作。