Federated learning is an emerging research paradigm for enabling collaboratively training deep learning models without sharing patient data. However, the data from different institutions are usually heterogeneous across institutions, which may reduce the performance of models trained using federated learning. In this study, we propose a novel heterogeneity-aware federated learning method, SplitAVG, to overcome the performance drops from data heterogeneity in federated learning. Unlike previous federated methods that require complex heuristic training or hyper parameter tuning, our SplitAVG leverages the simple network split and feature map concatenation strategies to encourage the federated model training an unbiased estimator of the target data distribution. We compare SplitAVG with seven state-of-the-art federated learning methods, using centrally hosted training data as the baseline on a suite of both synthetic and real-world federated datasets. We find that the performance of models trained using all the comparison federated learning methods degraded significantly with the increasing degrees of data heterogeneity. In contrast, SplitAVG method achieves comparable results to the baseline method under all heterogeneous settings, that it achieves 96.2% of the accuracy and 110.4% of the mean absolute error obtained by the baseline in a diabetic retinopathy binary classification dataset and a bone age prediction dataset, respectively, on highly heterogeneous data partitions. We conclude that SplitAVG method can effectively overcome the performance drops from variability in data distributions across institutions. Experimental results also show that SplitAVG can be adapted to different base networks and generalized to various types of medical imaging tasks.
翻译:联邦学习是一种新兴的研究模式,有助于在不分享病人数据的情况下合作培训深层次学习模式。然而,不同机构的数据在各机构之间通常各不相同,这可能会降低使用联盟式学习所培训模型的性能。在本研究中,我们建议采用一种新的异质性觉悟联合学习方法SlipAVG,以克服联邦学习中数据异质性差的性能下降。与以往需要复杂超常培训或超高参数调的联邦化方法不同,我们的SlipAVG利用简单的网络分解和地貌地图配置战略,鼓励联合模式培训目标数据分布的公正估计器。我们把SlitAVG与七种最先进的联邦化学习方法SlitAVG相比较,使用中央主机组培训数据作为合成和现实世界联合化数据集组合的基线。我们发现,使用所有比较式联合化学习方法所培训的模型的性能随着数据多样化程度的提高而大大退化。相比之下,SlidAVG方法在通用数据分布分布上取得可与基线方法相比较结果,通过所有混合式的精确度的精确度数据分类,也通过精确度数据得出了精确度数据。