提取甲状腺结节超声报告特征的基于Transformer自然语言处理方法 (Extracting Thyroid Nodules Characteristics from Ultrasound Reports Using Transformer-based Natural Language Processing Methods)

The ultrasound characteristics of thyroid nodules guide the evaluation of thyroid cancer in patients with thyroid nodules. However, the characteristics of thyroid nodules are often documented in clinical narratives such as ultrasound reports. Previous studies have examined natural language processing (NLP) methods in extracting a limited number of characteristics (<9) using rule-based NLP systems. In this study, a multidisciplinary team of NLP experts and thyroid specialists, identified thyroid nodule characteristics that are important for clinical care, composed annotation guidelines, developed a corpus, and compared 5 state-of-the-art transformer-based NLP methods, including BERT, RoBERTa, LongFormer, DeBERTa, and GatorTron, for extraction of thyroid nodule characteristics from ultrasound reports. Our GatorTron model, a transformer-based large language model trained using over 90 billion words of text, achieved the best strict and lenient F1-score of 0.8851 and 0.9495 for the extraction of a total number of 16 thyroid nodule characteristics, and 0.9321 for linking characteristics to nodules, outperforming other clinical transformer models. To the best of our knowledge, this is the first study to systematically categorize and apply transformer-based NLP models to extract a large number of clinical relevant thyroid nodule characteristics from ultrasound reports. This study lays ground for assessing the documentation quality of thyroid ultrasound reports and examining outcomes of patients with thyroid nodules using electronic health records.

翻译：甲状腺结节的超声特征指导甲状腺癌患者的评估。然而，甲状腺结节的特征通常记录在超声报告等临床叙述中。以往的研究使用基于规则的NLP系统提取有限数量（＜9）的特征。在本研究中，由NLP专家和甲状腺专家组成的多学科团队确定了对临床护理重要的甲状腺结节特征，编制了注释指南，开发了一个语料库，并比较了五种最新的基于Transformer的NLP方法，包括BERT、RoBERTa、LongFormer、DeBERTa和GatorTron，在从超声报告中提取甲状腺结节特征方面的效果。我们的GatorTron模型是一个基于Transformer大型语言模型，使用超过900亿个单词的文本进行训练，实现了从超声报告中提取总共16种甲状腺结节特征的最佳严格和宽松F1分数分别为0.8851和0.9495，以及0.9321的链接特征至结节的F1分数，优于其他临床Transformer模型。据我们所知，这是第一项系统分类和应用基于Transformer的NLP模型从超声报告中提取大量临床相关的甲状腺结节特征的研究。本研究为评估甲状腺超声报告的文件质量和使用电子健康记录检查甲状腺结节患者的结果奠定了基础。