A long-running goal of the clinical NLP community is the extraction of important variables trapped in clinical notes. However, roadblocks have included dataset shift from the general domain and a lack of public clinical corpora and annotations. In this work, we show that large language models, such as InstructGPT, perform well at zero- and few-shot information extraction from clinical text despite not being trained specifically for the clinical domain. Whereas text classification and generation performance have already been studied extensively in such models, here we additionally demonstrate how to leverage them to tackle a diverse set of NLP tasks which require more structured outputs, including span identification, token-level sequence classification, and relation extraction. Further, due to the dearth of available data to evaluate these systems, we introduce new datasets for benchmarking few-shot clinical information extraction based on a manual re-annotation of the CASI dataset for new tasks. On the clinical extraction tasks we studied, the GPT-3 systems significantly outperform existing zero- and few-shot baselines.
翻译:临床NLP社区的长期目标是从临床记录中解析重要的变量,然而,路障包括从一般领域转移数据集,以及缺乏公共临床公司和说明。在这项工作中,我们表明,大型语言模型,如StectionGPT,尽管没有为临床领域专门培训,但临床文本的零光和零光信息提取效果良好。虽然在这类模型中已经对文本分类和生成绩效进行了广泛研究,但我们在这里还展示了如何利用它们处理一系列不同的NLP任务,这些任务需要更结构化的产出,包括范围识别、代号序列分类和关系提取。此外,由于缺少可用于评估这些系统的数据,我们根据对CASI数据集的人工再注解,采用新的数据集,为几发临床信息提取基准设定新的数据集。关于临床提取任务,GPT-3系统大大超出现有的零光和几发基线。