从竞争到协作:利用联邦学习方法,使Kaggle的玩具数据集在临床上对胸透X射线诊断有用 (From Competition to Collaboration: Making Toy Datasets on Kaggle Clinically Useful for Chest X-Ray Diagnosis Using Federated Learning)

Chest X-ray (CXR) datasets hosted on Kaggle, though useful from a data science competition standpoint, have limited utility in clinical use because of their narrow focus on diagnosing one specific disease. In real-world clinical use, multiple diseases need to be considered since they can co-exist in the same patient. In this work, we demonstrate how federated learning (FL) can be used to make these toy CXR datasets from Kaggle clinically useful. Specifically, we train a single FL classification model (`global`) using two separate CXR datasets -- one annotated for presence of pneumonia and the other for presence of pneumothorax (two common and life-threatening conditions) -- capable of diagnosing both. We compare the performance of the global FL model with models trained separately on both datasets (`baseline`) for two different model architectures. On a standard, naive 3-layer CNN architecture, the global FL model achieved AUROC of 0.84 and 0.81 for pneumonia and pneumothorax, respectively, compared to 0.85 and 0.82, respectively, for both baseline models (p>0.05). Similarly, on a pretrained DenseNet121 architecture, the global FL model achieved AUROC of 0.88 and 0.91 for pneumonia and pneumothorax, respectively, compared to 0.89 and 0.91, respectively, for both baseline models (p>0.05). Our results suggest that FL can be used to create global `meta` models to make toy datasets from Kaggle clinically useful, a step forward towards bridging the gap from bench to bedside.

翻译：在卡格格勒(Kaggle)托管的切斯特X射线(CXR)数据集,虽然从数据科学竞争的角度来说是有用的,但在临床使用方面用处有限,因为它们狭隘地侧重于诊断一种特定疾病。在现实世界临床使用中,需要考虑多种疾病,因为它们在同一病人中同时存在。在这项工作中,我们展示了如何利用由卡格格勒(Kagggle)托管的玩具CXR数据集。具体地说,我们用两个单独的CXR数据集来培训单一的FL分类模型(`Global'),使用两个单独的CXR数据集 -- -- 一个是肺炎的存在附加说明的,另一个是肺炎(两种常见和危及生命的条件)的存在。在现实世界临床使用中,多个疾病模型(FLL)的性能与在两个不同的模型上分别培训过的模型(`基准线'FLL')的性能进行比较。在标准、天真的3级CNN架构中,全球FL模型从0.84和0.81的基数差距从0.81到肺炎和肺炎和肺部阵列阵列的基数(分别从0.80比、0.8至0.8和0.8的FNS的基数),在基准中分别是ANS的模型,在0.80至0.8至0.8和0.8至0.8和0.8和0.8至0.8和0.8和0.8的模型的模型。

相关内容

MoDELS

关注 44

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

【开放书】卡耐基梅隆大学Elaine Shi 教授《Foundations of Distributed Consensus and Blockchains（分布式共识和区块链的基础）》150页pdf

专知会员服务

30+阅读 · 2022年2月22日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

专知会员服务

59+阅读 · 2020年1月25日