Training Deep Learning (DL) models require large, high-quality datasets, often assembled with data from different institutions. Federated Learning (FL) has been emerging as a method for privacy-preserving pooling of datasets employing collaborative training from different institutions by iteratively globally aggregating locally trained models. One critical performance challenge of FL is operating on datasets not independently and identically distributed (non-IID) among the federation participants. Even though this fragility cannot be eliminated, it can be debunked by a suitable optimization of two hyper-parameters: layer normalization methods and collaboration frequency selection. In this work, we benchmark five different normalization layers for training Neural Networks (NNs), two families of non-IID data skew, and two datasets. Results show that Batch Normalization, widely employed for centralized DL, is not the best choice for FL, whereas Group and Layer Normalization consistently outperform Batch Normalization. Similarly, frequent model aggregation decreases convergence speed and mode quality.
翻译:在非独立同分布场景下规范化层在联邦学习中的实验
研究摘要:
深度学习模型需要大量高质量的数据集,通常由不同机构的数据组合而成。联邦学习作为一种隐私保护的方法,通过将不同机构的本地训练模型迭代全局聚合来实现协作训练。联邦学习的一个关键性能挑战是涉及到数据在联邦参与者之间不独立和同分布的情况。尽管这种脆弱性无法被消除,但我们可以通过适当的优化两个超参数来很好地解决这个问题,即规范化方法和协作频率的选择。本研究对五种不同的规范化层进行了基准测试,验证了两个非独立同分布数据偏斜和两个数据集的训练神经网络模型的性能。结果显示,批量规范化,即在集中式深度学习中广泛采用的方法,不适合用于联邦学习,而分组规范化和层规范化则一直表现出较好的效果。同样,频繁的模型聚合会降低模型收敛速度和模型质量。