We propose an approach for adapting the DeBERTa model for electronic health record (EHR) tasks using domain adaptation. We pretrain a small DeBERTa model on a dataset consisting of MIMIC-III discharge summaries, clinical notes, radiology reports, and PubMed abstracts. We compare this model's performance with a DeBERTa model pre-trained on clinical texts from our institutional EHR (MeDeBERTa) and an XGBoost model. We evaluate performance on three benchmark tasks for emergency department outcomes using the MIMIC-IV-ED dataset. We preprocess the data to convert it into text format and generate four versions of the original datasets to compare data processing and data inclusion. The results show that our proposed approach outperforms the alternative models on two of three tasks (p<0.001) and matches performance on the third task, with the use of descriptive columns improving performance over the original column names.
翻译:我们提出了一种使用领域自适应调整 DeBERTa 模型进行电子病历任务的方法。我们在由 MIMIC-III 出院小结、临床笔记、放射学报告和 PubMed 摘要组成的数据集上预训练了一个小型 DeBERTa 模型。我们将这个模型的性能与在我们机构电子病历上的临床文本上预训练的 DeBERTa 模型(MeDeBERTa)和 XGBoost 模型进行了比较。我们使用 MIMIC-IV-ED 数据集对三个紧急科室任务的性能进行了评估。我们对数据进行预处理,将其转换为文本格式,并生成了原始数据集的四个版本,以比较数据处理和数据包含情况。结果表明,我们提出的方法在三个任务中有两个任务的性能优于其他模型(p<0.001),并且使用描述性列名可以提高性能。