Symptom information is primarily documented in free-text clinical notes and is not directly accessible for downstream applications. To address this challenge, information extraction approaches that can handle clinical language variation across different institutions and specialties are needed. In this paper, we present domain generalization for symptom extraction using pretraining and fine-tuning data that differs from the target domain in terms of institution and/or specialty and patient population. We extract symptom events using a transformer-based joint entity and relation extraction method. To reduce reliance on domain-specific features, we propose a domain generalization method that dynamically masks frequent symptoms words in the source domain. Additionally, we pretrain the transformer language model (LM) on task-related unlabeled texts for better representation. Our experiments indicate that masking and adaptive pretraining methods can significantly improve performance when the source domain is more distant from the target domain.
翻译:症状信息主要记录在自由文本临床说明中,不能直接用于下游应用。为了应对这一挑战,需要采用能够处理不同机构和专业临床语言差异的信息提取方法。在本文中,我们用与目标领域不同的机构、特殊人群和病人的预培训和微调数据,为症状提取提供常规化领域。我们用基于变压器的联合实体和关联提取方法提取症状事件。为减少对具体领域特征的依赖,我们建议了一种能动态地遮盖源领域常见症状单词的域化概括化方法。此外,我们预先将变压器语言模型用于与任务相关的无标签文本,以更好地表达。我们的实验表明,当源领域距离目标领域更远时,遮掩和适应性培训前方法可以显著改善性能。</s>