Chest radiography is the most common radiographic examination performed in daily clinical practice for the detection of various heart and lung abnormalities. The large amount of data to be read and reported, with more than 100 studies per day for a single radiologist, poses a challenge in consistently maintaining high interpretation accuracy. The introduction of large-scale public datasets has led to a series of novel systems for automated abnormality classification. However, the labels of these datasets were obtained using natural language processed medical reports, yielding a large degree of label noise that can impact the performance. In this study, we propose novel training strategies that handle label noise from such suboptimal data. Prior label probabilities were measured on a subset of training data re-read by 4 board-certified radiologists and were used during training to increase the robustness of the training model to the label noise. Furthermore, we exploit the high comorbidity of abnormalities observed in chest radiography and incorporate this information to further reduce the impact of label noise. Additionally, anatomical knowledge is incorporated by training the system to predict lung and heart segmentation, as well as spatial knowledge labels. To deal with multiple datasets and images derived from various scanners that apply different post-processing techniques, we introduce a novel image normalization strategy. Experiments were performed on an extensive collection of 297,541 chest radiographs from 86,876 patients, leading to a state-of-the-art performance level for 17 abnormalities from 2 datasets. With an average AUC score of 0.880 across all abnormalities, our proposed training strategies can be used to significantly improve performance scores.
翻译:在日常临床实践中,为检测各种心脏和肺部异常情况而进行的最常见的放射检查是日常临床实践中最常用的放射检查。大量数据需要阅读和报告,每天对一名放射师进行100多项研究,这对持续保持高判读准确性构成挑战。大规模公共数据集的采用导致了一系列新的自动异常分类系统。然而,这些数据集的标签是用自然语言处理的医疗报告获得的,产生了大量影响性能的标签噪音。在这个研究中,我们提出了处理此类亚最佳数据的标签噪音的新培训战略。以前标签的概率是在4个委员会认证的放射师对一组培训数据进行再读的子上测量的。在培训期间使用了大规模公共数据集,以提高培训模型对标签异常性能的稳健性。此外,我们利用在胸腔放射学中观察到的异常的高度复杂性,并纳入这些信息以进一步降低标签噪音的影响。此外,通过培训系统来预测肺部和心脏分层的噪音,将先前标签的概率测量概率用在4个委员会认证的正常度上进行测量,并且将一个跨空间级的成绩分析策略用于从一个不同的扫描。我们使用多种数据处理的数据收集。我们用了一个从一个从一个不同层次的顺序到一个不同层次的顺序的模型,一个不同的计算。 将一个从一个从一个高级的顺序到一个从一个从一个从一个从一个分析到一个不同的扫描到一个从一个高级的顺序到一个高级的顺序到一个分析的顺序的顺序到一个不同的计算。