Objective: The n2c2/UW SDOH Challenge explores the extraction of social determinant of health (SDOH) information from clinical notes. The objectives include the advancement of natural language processing (NLP) information extraction techniques for SDOH and clinical information more broadly. This paper presents the shared task, data, participating teams, performance results, and considerations for future work. Materials and Methods: The task used the Social History Annotated Corpus (SHAC), which consists of clinical text with detailed event-based annotations for SDOH events such as alcohol, drug, tobacco, employment, and living situation. Each SDOH event is characterized through attributes related to status, extent, and temporality. The task includes three subtasks related to information extraction (Subtask A), generalizability (Subtask B), and learning transfer (Subtask C). In addressing this task, participants utilized a range of techniques, including rules, knowledge bases, n-grams, word embeddings, and pretrained language models (LM). Results: A total of 15 teams participated, and the top teams utilized pretrained deep learning LM. The top team across all subtasks used a sequence-to-sequence approach achieving 0.901 F1 for Subtask A, 0.774 F1 Subtask B, and 0.889 F1 for Subtask C. Conclusions: Similar to many NLP tasks and domains, pretrained LM yielded the best performance, including generalizability and learning transfer. An error analysis indicates extraction performance varies by SDOH, with lower performance achieved for conditions, like substance use and homelessness, that increase health risks (risk factors) and higher performance achieved for conditions, like substance abstinence and living with family, that reduce health risks (protective factors).
翻译:目标: n2c2/UW SDOH 挑战 探索从临床说明中提取健康的社会决定因素信息(SDOH), 目标包括推进SDOH 的自然语言处理(NLP) 信息提取技术, 以及更广泛的临床信息。 本文介绍了共同的任务、 数据、 参与团队、 绩效结果和今后工作的考量。 材料和方法: 任务使用了社会历史说明 Corpus (SHAC) (SHAC) (SHAC), 包含针对SDOH 事件的详细事件说明, 如酒精、毒品、烟草、就业和生活状况等。 每一次SDOH 事件都通过与状态、程度和时间性有关的属性特征来定性。 任务包括三个子任务: 信息提取(Subtask A)、 通用(Subtask B) 以及学习(Subtask) 等, 使用一系列技术, 包括规则、知识基础、 字型、 字型、字型、字型、 语言模型(LMM) 等。 结果:共有15个团队参加, 和顶级条件使用预先学习LMM。 达到LM。