Objective: To develop a natural language processing (NLP) system to extract medications and contextual information that help understand drug changes. This project is part of the 2022 n2c2 challenge. Materials and methods: We developed NLP systems for medication mention extraction, event classification (indicating medication changes discussed or not), and context classification to classify medication changes context into 5 orthogonal dimensions related to drug changes. We explored 6 state-of-the-art pretrained transformer models for the three subtasks, including GatorTron, a large language model pretrained using >90 billion words of text (including >80 billion words from >290 million clinical notes identified at the University of Florida Health). We evaluated our NLP systems using annotated data and evaluation scripts provided by the 2022 n2c2 organizers. Results:Our GatorTron models achieved the best F1-scores of 0.9828 for medication extraction (ranked 3rd), 0.9379 for event classification (ranked 2nd), and the best micro-average accuracy of 0.9126 for context classification. GatorTron outperformed existing transformer models pretrained using smaller general English text and clinical text corpora, indicating the advantage of large language models. Conclusion: This study demonstrated the advantage of using large transformer models for contextual medication information extraction from clinical narratives.
翻译:目标:开发一种自然语言处理系统(NLP),以提取有助于理解药物变化的药物和背景信息。该项目是2022年N2c2挑战的一部分。材料和方法:我们开发了NLP系统,用于药物提取、事件分类(说明药物变化讨论与否)以及将药物变化环境分类为与药物变化有关的5个正方位层面的背景分类。我们探索了3个子任务(包括GatorTron)的6个最先进的预先培训变压器模型,包括GatorTron,一个使用900亿字文字字(包括佛罗里达卫生大学确定的超过2.9亿字临床说明的8 000亿字)。我们利用2022 n2c2组织者提供的附加说明的数据和评价脚本,对NLP系统进行了评估。结果:我们的GatorTron模型在提取药物方面实现了0.9828个最佳F1核心(排名第3位)、事件分类(排名第2位)0.9379个大型语言预变压器模型,以及背景文字分类方面0.9126个最佳微平均精确的文本分类。GatorTron 模型展示了现有变形模型。</s>