Many large-scale chest x-ray datasets have been curated for the detection of abnormalities using deep learning, with the potential to provide substantial benefits across many clinical applications. However, these datasets focus on detecting a subset of disease labels that could be present, thus limiting their clinical utility. Furthermore, the distributed nature of these datasets, along with data sharing regulations, makes it difficult to share and create a complete representation of disease labels. To that end, we propose surgical aggregation, a federated learning framework for aggregating and harmonizing knowledge from distributed datasets with different disease labels into a 'global' deep learning model. We utilized surgical aggregation to harmonize the NIH (14 labels) and CheXpert (13 labels) datasets into a global model with the ability to predict all 20 unique disease labels and compared it to the performance of 'baseline' models trained individually on both datasets. We observed that the global model resulted in excellent performance across held-out test sets from both datasets with an average AUROC of 0.75 and 0.74 respectively when compared to the baseline average AUROC of 0.81 and 0.71. On the MIMIC external test set, we observed that the global model had better generalizability with average AUROC of 0.80, compared to the average AUROC of 0.74 and 0.76 respectively for the baseline models. Our results show that surgical aggregation has the potential to develop clinically useful deep learning models by aggregating knowledge from distributed datasets with diverse tasks -- a step forward towards bridging the gap from bench to bedside.
翻译:许多大型胸前X射线数据集都利用深层学习发现异常现象,并有可能为许多临床应用提供大量好处。然而,这些数据集侧重于检测可能存在的一组疾病标签,从而限制其临床效用。此外,这些数据集的分布性质,加上数据共享条例,使得难以分享和完整地展示疾病标签。为此,我们提议了手术汇总,一个将分布式数据集中不同疾病标签的不同疾病标签的分布式知识汇集和统一为“全球深度学习模型”的多样化学习框架。我们利用外科汇总将NIH(14个标签)和CheXpert(13个标签)数据整合为一个全球模型,能够预测所有20个独特的疾病标签,并将其与在这两个数据集中单独培训的“基准”模型的性能进行比较。我们注意到,全球模型在从有用的数据集到平均AUROC的实用性测试中,从平均0.75和0.74级跨级数据库,与基线AUROC的平均平均学习结果相比,我们分别观察到了0.81和0.74的全球模型。