Federated learning enables multiple institutions to collaboratively train machine learning models on their local data in a privacy-preserving way. However, its distributed nature often leads to significant heterogeneity in data distributions across institutions. In this paper, we investigate the deleterious impact of a taxonomy of data heterogeneity regimes on federated learning methods, including quantity skew, label distribution skew, and imaging acquisition skew. We show that the performance degrades with the increasing degrees of data heterogeneity. We present several mitigation strategies to overcome performance drops from data heterogeneity, including weighted average for data quantity skew, weighted loss and batch normalization averaging for label distribution skew. The proposed optimizations to federated learning methods improve their capability of handling heterogeneity across institutions, which provides valuable guidance for the deployment of federated learning in real clinical applications.
翻译:联邦学习使多个机构能够以保护隐私的方式合作培训机器学习模型,以掌握当地数据,然而,其分布性往往导致各机构数据分布的显著差异性。在本文件中,我们调查了数据差异性分类制度对联邦学习方法的有害影响,包括数量斜线、标签分布斜线和图像获取斜线。我们表明,随着数据异质程度的提高,业绩会下降。我们提出了几项缓解战略,以克服数据差异性的业绩下降,包括数据数量斜线、加权损失和标签分布斜线的批量平均加权平均数。拟议的联邦学习方法优化提高了各机构处理异质性的能力,为在实际临床应用中应用联合学习提供了宝贵的指导。