There is an increasing interest in developing artificial intelligence (AI) systems to process and interpret electronic health records (EHRs). Natural language processing (NLP) powered by pretrained language models is the key technology for medical AI systems utilizing clinical narratives. However, there are few clinical language models, the largest of which trained in the clinical domain is comparatively small at 110 million parameters (compared with billions of parameters in the general domain). It is not clear how large clinical language models with billions of parameters can help medical AI systems utilize unstructured EHRs. In this study, we develop from scratch a large clinical language model - GatorTron - using >90 billion words of text (including >82 billion words of de-identified clinical text) and systematically evaluate it on 5 clinical NLP tasks including clinical concept extraction, medical relation extraction, semantic textual similarity, natural language inference (NLI), and medical question answering (MQA). We examine how (1) scaling up the number of parameters and (2) scaling up the size of the training data could benefit these NLP tasks. GatorTron models scale up the clinical language model from 110 million to 8.9 billion parameters and improve 5 clinical NLP tasks (e.g., 9.6% and 9.5% improvement in accuracy for NLI and MQA), which can be applied to medical AI systems to improve healthcare delivery. The GatorTron models are publicly available at: https://catalog.ngc.nvidia.com/orgs/nvidia/teams/clara/models/gatortron_og.
翻译:开发人工智能(AI)系统,处理和解释电子健康记录(EHRs)的兴趣日益浓厚;由预先培训的语言模型驱动的自然语言处理(NLP)是医学人工智能系统使用临床叙述的关键技术;然而,很少有临床语言模型,其中在临床领域受过培训的最大是1.1亿参数,相对小于1.1亿参数(而一般领域有数十亿参数);具有数十亿参数的大型临床语言模型能够帮助医学人工智能系统使用非结构化的EHRs;在这项研究中,我们从零开始开发一个大型临床语言模型 - GatorTron - 使用900亿字(包括0.820亿字的去确定临床文本),并系统地评估5个临床国家智能系统任务,包括临床概念提取、医学关系提取、语义相似性文本推断(NLI)和医学解答(MQA)等临床语言模型,从1.10亿到8.0 %的内卡/内基/内基/内基/内基系统。我们研究(1) 扩大参数,(2) 扩大现有培训数据的规模,可以使NLPM.6亿模型受益。G模型,在NA/内基/内改进。