AI-assisted characterization of chest x-rays (CXR) has the potential to provide substantial benefits across many clinical applications. Many large-scale public CXR datasets have been curated for detection of abnormalities using deep learning. However, each of these datasets focus on detecting a subset of disease labels that could be present in a CXR, thus limiting their clinical utility. Furthermore, the distributed nature of these datasets, along with data sharing regulations, make it difficult to share and create a complete representation of disease labels. We propose surgical aggregation, a federated learning framework for aggregating knowledge from distributed datasets with different disease labels into a 'global' deep learning model. We randomly divided the NIH Chest X-Ray 14 dataset into training (70%), validation (10%), and test (20%) splits with no patient overlap and conducted two experiments. In the first experiment, we pruned the disease labels to create two 'toy' datasets containing 11 and 8 labels respectively with 4 overlapping labels. For the second experiment, we pruned the disease labels to create two disjoint 'toy' datasets with 7 labels each. We observed that the surgically aggregated 'global' model resulted in excellent performance across both experiments when compared to a 'baseline' model trained on complete disease labels. The overlapping and disjoint experiments had an AUROC of 0.87 and 0.86 respectively, compared to the baseline AUROC of 0.87. We used surgical aggregation to harmonize the NIH Chest X-Ray 14 and CheXpert datasets into a 'global' model with an AUROC of 0.85 and 0.83 respectively. Our results show that surgical aggregation could be used to develop clinically useful deep learning models by aggregating knowledge from distributed datasets with diverse tasks, a step forward towards bridging the gap from bench to bedside.
翻译:对胸前X射线(CXR) 的 AI 协助描述 胸前X射线(CXR) 具有巨大的潜力, 在许多临床应用中提供大量的好处。 许多大型公共的 CXR 数据集已经通过深层学习为检测异常情况做了整理。 但是,这些数据集的每一个侧重于检测在 CXR 中可能存在的一组疾病标签,从而限制了它们的临床效用。 此外,这些数据集的分布性质,加上数据共享条例,使得很难共享和创建疾病标签的完整代表。 我们提议了手术汇总,一个将不同疾病标签的分布式数据库知识汇总成“全球深层学习模型 ” 的联动学习框架。 然而,我们随机将NIH Chest X- Ray 14 数据集分为培训(70%)、 验证(10%) 和测试(20%) 与没有病人重叠, 并进行了两次实验。 在第一次实验中,我们将病前方的模型“Teleet” 和“treal” 创建了两个“treal ” 数据库,分别包含11和8个和8个“grealteral”标签。 关于第二个实验, 我们将疾病标签的基比为“OD”的比为“O” 。我们分别进行了“OD” 和“O” 和“O” 和“O”的比为“O” 。我们做了一个“O” 。