Radgraph:从放射学报告中提取临床实体和关系 (RadGraph: Extracting Clinical Entities and Relations from Radiology Reports)

Saahil Jain,Ashwin Agrawal,Adriel Saporta,Steven QH Truong,Du Nguyen Duong,Tan Bui,Pierre Chambon,Yuhao Zhang,Matthew P. Lungren,Andrew Y. Ng,Curtis P. Langlotz,Pranav Rajpurkar

from arxiv, Accepted to the 35th Conference on Neural Information Processing Systems (NeurIPS 2021) Track on Datasets and Benchmarks

Extracting structured clinical information from free-text radiology reports can enable the use of radiology report information for a variety of critical healthcare applications. In our work, we present RadGraph, a dataset of entities and relations in full-text chest X-ray radiology reports based on a novel information extraction schema we designed to structure radiology reports. We release a development dataset, which contains board-certified radiologist annotations for 500 radiology reports from the MIMIC-CXR dataset (14,579 entities and 10,889 relations), and a test dataset, which contains two independent sets of board-certified radiologist annotations for 100 radiology reports split equally across the MIMIC-CXR and CheXpert datasets. Using these datasets, we train and test a deep learning model, RadGraph Benchmark, that achieves a micro F1 of 0.82 and 0.73 on relation extraction on the MIMIC-CXR and CheXpert test sets respectively. Additionally, we release an inference dataset, which contains annotations automatically generated by RadGraph Benchmark across 220,763 MIMIC-CXR reports (around 6 million entities and 4 million relations) and 500 CheXpert reports (13,783 entities and 9,908 relations) with mappings to associated chest radiographs. Our freely available dataset can facilitate a wide range of research in medical natural language processing, as well as computer vision and multi-modal learning when linked to chest radiographs.

翻译：从自由文本放射学报告中提取结构化临床信息,可以用于各种重要的保健应用。在我们的工作中,我们提供RadGraph,这是完整文本胸X射线放射学报告中实体和关系的数据集,它基于我们设计用来构建放射学报告的新颖信息提取图谱。我们发布一个发展数据集,其中载有由理事会认证的辐射学家说明,用于MIMIMIC-CXR数据集(14 579个实体和10 889个关系)的500个放射学报告(14 579个实体和10 889个关系),以及一个测试数据集,其中载有两套独立的经理事会认证的100个放射学报告的放射学说明,在MIMIMIC-CXR和CheXpert数据集中均分为两组。我们利用这些数据集,培训和测试一个深度学习模型,RadGraph基准,该模型在MIMIC-CR和CheXpert测试数据集的500F1和0.73号关系图解中分别实现0.82和0.73的缩略图。此外,我们可以发布由Rad Graph基准在220、763 MICR和CheXX关系中自动生成(大约6000个实体和可自由检索) 和4X关系(大约6百万个实体和可检索)的4x关系中进行的研究报告。