Beyond their primary diagnostic purpose, radiology reports have been an invaluable source of information in medical research. Given a corpus of radiology reports, researchers are often interested in identifying a subset of reports describing a particular medical finding. Because the space of medical findings in radiology reports is vast and potentially unlimited, recent studies proposed mapping free-text statements in radiology reports to semi-structured strings of terms taken from a limited vocabulary. This paper aims to present an approach for the automatic generation of semi-structured representations of radiology reports. The approach consists of matching sentences from radiology reports to manually created semi-structured representations, followed by learning a sequence-to-sequence neural model that maps matched sentences to their semi-structured representations. We evaluated the proposed approach on the OpenI corpus of manually annotated chest x-ray radiology reports. The results indicate that the proposed approach is superior to several baselines, both in terms of (1) quantitative measures such as BLEU, ROUGE, and METEOR and (2) qualitative judgment of a radiologist. The results also demonstrate that the trained model produces reasonable semi-structured representations on an out-of-sample corpus of chest x-ray radiology reports from a different medical provider.
翻译:放射学报告除了主要诊断目的外,还一直是医学研究的宝贵信息来源。鉴于有一整套放射学报告,研究人员往往有兴趣确定一组描述特定医学结果的报告。由于放射学报告中医学发现的范围很广,而且可能是无限的,最近的研究提议将放射学报告中的自由文本说明绘图成半结构化的词汇,本文件旨在提出一种方法,用于自动生成半结构化放射学报告的表述;这一方法包括将放射学报告中的句子与人工创建的半结构化表述相匹配,然后学习一个顺序到顺序的神经模型,将判决与半结构化表述相匹配。我们评估了在开放信息库中拟议的手动胸前注解X射线放射学报告中的方法。结果显示,拟议的方法优于若干基线,即(1) 定量措施,如BLEU、ROUGE和METEOR,以及(2) 放射学家的定性判断。结果还表明,经过训练的模型从不同的胸腔射线档案中产生了合理的半结构化半结构化的医学报告。