Extracting structured clinical information from free-text radiology reports can enable the use of radiology report information for a variety of critical healthcare applications. In our work, we present RadGraph, a dataset of entities and relations in full-text chest X-ray radiology reports based on a novel information extraction schema we designed to structure radiology reports. We release a development dataset, which contains board-certified radiologist annotations for 500 radiology reports from the MIMIC-CXR dataset (14,579 entities and 10,889 relations), and a test dataset, which contains two independent sets of board-certified radiologist annotations for 100 radiology reports split equally across the MIMIC-CXR and CheXpert datasets. Using these datasets, we train and test a deep learning model, RadGraph Benchmark, that achieves a micro F1 of 0.82 and 0.73 on relation extraction on the MIMIC-CXR and CheXpert test sets respectively. Additionally, we release an inference dataset, which contains annotations automatically generated by RadGraph Benchmark across 220,763 MIMIC-CXR reports (around 6 million entities and 4 million relations) and 500 CheXpert reports (13,783 entities and 9,908 relations) with mappings to associated chest radiographs. Our freely available dataset can facilitate a wide range of research in medical natural language processing, as well as computer vision and multi-modal learning when linked to chest radiographs.
翻译:从自由文本放射学报告中提取结构化临床信息,可以用于各种重要的保健应用。在我们的工作中,我们提供RadGraph,这是完整文本胸X射线放射学报告中实体和关系的数据集,它基于我们设计用来构建放射学报告的新颖信息提取图谱。我们发布一个发展数据集,其中载有由理事会认证的辐射学家说明,用于MIMIMIC-CXR数据集(14 579个实体和10 889个关系)的500个放射学报告(14 579个实体和10 889个关系),以及一个测试数据集,其中载有两套独立的经理事会认证的100个放射学报告的放射学说明,在MIMIMIC-CXR和CheXpert数据集中均分为两组。我们利用这些数据集,培训和测试一个深度学习模型,RadGraph基准,该模型在MIMIC-CR和CheXpert测试数据集的500F1和0.73号关系图解中分别实现0.82和0.73的缩略图。此外,我们可以发布由Rad Graph基准在220、763 MICR和CheXX关系中自动生成(大约6000个实体和可自由检索) 和4X关系(大约6百万个实体和可检索)的4x关系中进行的研究报告。