Models trained in federated settings often suffer from degraded performances and fail at generalizing, especially when facing heterogeneous scenarios. In this work, we investigate such behavior through the lens of geometry of the loss and Hessian eigenspectrum, linking the model's lack of generalization capacity to the sharpness of the solution. Motivated by prior studies connecting the sharpness of the loss surface and the generalization gap, we show that i) training clients locally with Sharpness-Aware Minimization (SAM) or its adaptive version (ASAM) and ii) averaging stochastic weights (SWA) on the server-side can substantially improve generalization in Federated Learning and help bridging the gap with centralized models. By seeking parameters in neighborhoods having uniform low loss, the model converges towards flatter minima and its generalization significantly improves in both homogeneous and heterogeneous scenarios. Empirical results demonstrate the effectiveness of those optimizers across a variety of benchmark vision datasets (e.g. CIFAR10/100, Landmarks-User-160k, IDDA) and tasks (large scale classification, semantic segmentation, domain generalization).
翻译:在这项工作中,我们通过对损失进行几何测量的透镜和Hessian eigenspectrum来调查这种行为,将模型缺乏一般化能力与解决办法的清晰度联系起来。在先前研究的推动下,将损失表面的锐度与普遍化差距联系起来,我们显示,i)对当地客户进行培训,培训对象采用“锐化-意识最小化”(SAM)或其适应性版本(ASAM),以及ii)服务器上的平均随机重(SWA),可以大大改善联邦学习的普及性,并帮助与集中模型弥合差距。通过在损失程度相同的社区寻找参数,模型会趋向于美化的小型模型及其一般化,在同质和复杂情景中都大为改进。 " 实证 " 结果表明,这些优化者在各种基准视觉数据集(例如,CIFAR10/100, Landmarks-Uner-160k, IDDA)和任务(大规模分类、磁区分割、磁区分割)中的有效性。