The curation of large-scale medical datasets from multiple institutions necessary for training deep learning models is challenged by the difficulty in sharing patient data with privacy-preserving. Federated learning (FL), a paradigm that enables privacy-protected collaborative learning among different institutions, is a promising solution to this challenge. However, FL generally suffers from performance deterioration due to heterogeneous data distributions across institutions and the lack of quality labeled data. In this paper, we present a robust and label-efficient self-supervised FL framework for medical image analysis. Specifically, we introduce a novel distributed self-supervised pre-training paradigm into the existing FL pipeline (i.e., pre-training the models directly on the decentralized target task datasets). Built upon the recent success of Vision Transformers, we employ masked image encoding tasks for self-supervised pre-training, to facilitate more effective knowledge transfer to downstream federated models. Extensive empirical results on simulated and real-world medical imaging federated datasets show that self-supervised pre-training largely benefits the robustness of federated models against various degrees of data heterogeneity. Notably, under severe data heterogeneity, our method, without relying on any additional pre-training data, achieves an improvement of 5.06%, 1.53% and 4.58% in test accuracy on retinal, dermatology and chest X-ray classification compared with the supervised baseline with ImageNet pre-training. Moreover, we show that our self-supervised FL algorithm generalizes well to out-of-distribution data and learns federated models more effectively in limited label scenarios, surpassing the supervised baseline by 10.36% and the semi-supervised FL method by 8.3% in test accuracy.
翻译:培训深层学习模式所需的多个机构大规模医疗数据集的缩写受到难以分享隐私保护的患者数据的挑战。 联邦学习(FL)是在不同机构之间进行隐私保护合作学习的范例,是应对这一挑战的一个大有希望的解决办法。 然而,FL通常由于各机构之间数据分布不一和缺乏质量标签数据而出现性能恶化。 在本文中,我们为医学图像分析提出了一个强有力和标签高效的自我监督FL自我监督的FL胸前框架。 具体地说,我们引入了一个新的分发的自我监督的训练前模式,进入了现有的FL管道(即直接在分散的目标任务数据集上对模型进行预先培训)。 联邦学习(FL)是应对这一挑战的一个很有希望的解决方案。 但是,FL通常由于各机构之间差异性的数据分布不一,我们使用隐蔽的图像编码任务,以便更有效地向下游偏斜模型转移知识。 模拟和真实的医学成型成型的FLLFL(FL)数据集显示,自我监督前自我监督的自我监督前过程在很大程度上有利于FLI(即直接培训模型的坚固化),在5L(feral)基线模型中有效对数据进行升级)模型的精准,在不精确的精确化模型中,在不精确度下,在不精确度上进行任何数据测试,在5LVDFLDLDLDLDL(x的精确度上进行任何数据中以任何数据更新的精确度上进行任何数据测试的精确度上进行任何数据测试。