Radiology reports are unstructured and contain the imaging findings and corresponding diagnoses transcribed by radiologists which include clinical facts and negated and/or uncertain statements. Extracting pathologic findings and diagnoses from radiology reports is important for quality control, population health, and monitoring of disease progress. Existing works, primarily rely either on rule-based systems or transformer-based pre-trained model fine-tuning, but could not take the factual and uncertain information into consideration, and therefore generate false-positive outputs. In this work, we introduce three sedulous augmentation techniques which retain factual and critical information while generating augmentations for contrastive learning. We introduce RadBERT-CL, which fuses these information into BlueBert via a self-supervised contrastive loss. Our experiments on MIMIC-CXR show superior performance of RadBERT-CL on fine-tuning for multi-class, multi-label report classification. We illustrate that when few labeled data are available, RadBERT-CL outperforms conventional SOTA transformers (BERT/BlueBert) by significantly larger margins (6-11%). We also show that the representations learned by RadBERT-CL can capture critical medical information in the latent space.
翻译:放射学报告是没有结构的,包含放射学家对成像结果和相应的诊断,包括临床事实以及否定和(或)不确定的陈述。从放射学报告中提取病理结果和诊断对质量控制、人口健康和疾病进展监测很重要。现有工作,主要依靠基于规则的系统或基于变压器的预先培训模型微调,但不能将事实和不确定的信息考虑在内,从而产生虚假的阳性产出。在这项工作中,我们引入了三种超音速增强技术,这些技术保留事实和关键信息,同时为对比性学习生成增强值。我们引入了RadBERT-CL,该技术通过自我监督的对比性损失将这些信息连接到蓝贝特。我们关于MIMIC-CXR的实验显示,RadBERT-CLL在对多级、多标签报告分类进行微调方面表现优异优。我们说明,当标签数据少时,RadBERTE-CLU将常规SOTA变异器(BLEBERBERBERBERBERT)的外观成大得多的医疗边缘(6-11 % ) 。我们还展示了拉德-CERTERTA在重要空间空间空间空间空间空间采集中学习的图像。我们也显示。